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1 . Basis of the report 
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language in which it was filed, unless otherwise indicated under this item. 
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Authority (Rule 23.1(b)). 
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international application as filed has been furnished. 

[X| the statement that the information recorded in computer readable form is identical to the written sequence listing has been 
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DO 



2. 
3. 



| j Certain claims were found unsearchable (See Box I). 
fX] Unity of invention is lacking (see Box II). 



4. With regard to the title, 

| | the text is approved as submitted by the applicant. 

[~X~| the text has been established by this Authority to read as follows: 

DNA MANIPULATION METHODS, APPLICATIONS FOR SYNTHETIC ENZYMES AND USE FOR 
POLYKETIDE PRODUCTION. 

5. With regard to the abstract, 

|~X~j the text is approved as submitted by the applicant. 

I I the text has been established, according to Rule 38.2(b), by this Authority as it appears in Box III. The applicant may, 
' — ' within one month from the date of mailing of this international search report, submit comments to this Authority. 

6. The figure of the drawings to be published with the abstract is Figure No. 



I | as suggested by the applicant. [X] None of the figures. 

| | because the applicant failed to suggest a figure. 

| | because this figure better characterizes the invention. 
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because they relate to subject matter not required to be searched by this Authority, namely: 



2. Claims Nos.: 

because they relate to parts of the International Application that do not comply with the prescribed requirements to such 

an extent that no meaningful International Search can be carried out, specifically: 



3. Claims Nos.: 

because they are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4(a). 

Box II Observations where unity of invention is lacking (Continuation of item 2 of first sheet) 

This International Searching Authority found multiple inventions in this international application, as follows: 

see additional sheet 



1 . r~~| As all required additional search fees were timely paid by the applicant, this International Search Report covers all 
searchable claims. 



2. P J As all searchable claims could be searched without effort justifying an additional fee, this Authority did not invite payment 
of any additional fee. 



3. | I As only some of the required additional search fees were timely paid by the applicant, this International Search Report 
■ ' covers only those claims for which fees were paid, specifically claims Nos.: 



4. T ] No required additional search fees were timely paid by the applicant. Consequently, this International Search Report is 
— restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 
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This International Searching Authority found multiple (groups of) 
inventions in this international application, as follows: 

1. Claims: (l-8,32,33,37,38)-complete, (16-31,36,41-43, 
46-48)-partially 



A method of assembling several DNA units in sequence in a 
DNA construct, which method comprises the step of: a) 
providing each DNA unit with a restriction enzyme 
recognition sequence at its 5' end and with a recognition 
sequence for the same restriction enzyme at its 3' end that 
is combined with a restriction site for a DNA modification 
enzyme, b) providing a starting DNA construct having an 
accessible restriction site for the same or a compatible 
restriction enzyme and cleaving the starting DNA construct 
with a restriction enzyme, c) inserting the desired DNA unit 
and bringing the ligated product into contact with a DNA 
modification enzyme such that the restriction site at the 3' 
end of the inserted DNA unit is abolished, d) cleaving the 
ligated product at an accessible unmodified recognition site 
for the same or a compatible restriction enzyme, e) 
repeating step c) and d) to introduce each desired DNA unit 
to give a DNA construct containing all the desired units in 
sequence; DNA construct incorporating one or more DNA 
assemblies encoding synthetic enzymes and/or hosts 
expressing DNA constructs made by said method; compounds 
produced by synthetic enzymes encoded by said DNA 
assemblies; a method of synthesising a target molecule using 
said method; a method of making a synthetic enzyme to 
catalyse the synthesis of a target molecule using said 
method; a library of DNA units encoding a catalytic or 
transport protein domains, wherein each DNA unit has a^ 
recognition sequence for a restriction enzyme at its 5'-end 
and a second recognition sequence for the same or a 
compatible enzyme at its 3'-end which incorporates a 
recognition sequence for a DNA modifying enzyme; 
a module comprising a DNA sequence encoding a functional set 
of polyketide synthetic domains wherein the module has a 
recognition sequence for a restriction enzyme at its 5'-end 
and a second recognition sequence for the same or a 
compatible enzyme at its 3'-end which incorporates a 
recognition sequence for a DNA modifying enzyme; 
a method of transforming a host with one or more synthetic 
DNA assemblies encoding enzyme domains, wherein the DNA 
assemblies are said modules; 



2. Claims: (9-15,33,34,39,40)-complete, (16-31,36,41-43, 
46-48)-partially 



Idem as invention 1, but limited to a method of: assembling 
several DNA units in sequence in a DNA construct, which 
method comprises the step of: a) providing a first DNA unit 
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with a recognition sequence for a first restriction enzyme 
at its 3' end, and cleaving the said first DNA unit with 
said first restriction enzyme, b) providing each other DNA 
unit with a recognition sequence at its 5' end for a second 
restriction enzyme which has a compatible ligation sequence 
with that of the first restriction enzyme, and a downstream 
recognition sequence for said first restriction enzyme 
followed by a downstream recognition sequence for a third 
restriction enzyme at its 3' end, and cleaving each said 
other DNA unit with the second and third restriction 
enzymes, c) ligating the said first DNA unit with a desired 
other DNA unit to form a 11 gated product such that the 
ligation of the two units abolishes the recognition site for 
the first restriction enzyme at the ligation junction, and 
cleaving the ligated product with said first restriction 
enzymel , d) ligating the product from c) with a desired DNA 
unit from b) to form a ligated product and cleaving the 
ligated product with said first restriction enzyme, 
e)repeating step d) with each other DNA unit in turn so as 
to assemble the DNA unit in sequence; 



3. Claims: 44-45 

A method of transforming a host with one or more synthetic 
DNA assemblies encoding enzyme domains which comprises the 
step of: a) Inserting said DNA assembly into a vector 
containing a mutated internal fragment of a recA gene 
sequence such that the vector is capable of undergoing 
homologous recombination with the recA gene of the host, b) 
bringing said vector into contact with a host chromosome 
under conditions which permit homologous recombination to 
take place, c) disrupting the host recA gene by integration 
of the DNA of said vector into the chromosome; said method 
wherein the expression vector is used to transform a 
Streptomyces host; 
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5. □ This report has been established as if (some of) the amendments had not been made, since they have been 

considered to go beyond the disclosure as filed (Rule 70.2(c)): 

(Any replacement sheet containing such amendments must be referred to under item 1 and annexed to this 
report.) 

6. Additional observations, if necessary: 

IV. Lack of unity of invention 

1 . In response to the invitation to restrict or pay additional fees the applicant has: 

□ restricted the claims. 
IS paid additional fees. 

□ paid additional fees under protest. 

□ neither restricted nor paid additional fees. 

2. □ This Authority found that the requirement of unity of invention is not complied and chose, according to Rule 

68.1 , not to invite the applicant to restrict or pay additional fees. 

3. This Authority considers that the requirement of unity of invention in accordance with Rules 13.1, 13.2 and 13.3 is 

□ complied with. 

□ not complied with for the following reasons: 

4. Consequently, the following parts of the international application were the subject of international preliminary 
examination in establishing this report: 

H all parts. 

□ the parts relating to claims Nos. . 

V. Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial applicability; 
citations and explanations supporting such statement 

1. Statement 
Form PCT/IPEA/409 (Boxes l-VIII, Sheet 2) (July 1998) 



INTERNATIONAL PRELIMINARY 
EXAMINATION REPORT 



International application No. PCT/GB00/02286 



Novelty (N) 



Yes: 
No: 



Claims 1-21, 27, 28, 31-43, 46, 47 
Claims 22-26, 29, 30, 44, 45, 48 



Inventive step (IS) 



Yes: 
No: 



Claims 1-21,27,28,31 
Claims 22-26, 29, 30, 32-48 



Industrial applicability (IA) 



Yes: 
No: 



Claims 1-48 
Claims 



2. Citations and explanations 
see separate sheet 



VIII. Certain observations on the international application 

The following observations on the clarity of the claims, description, and drawings or on the question whether the 
claims are fully supported by the description, are made: 
see separate sheet 
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Item V. 

Reference is made to the following documents: 

D1: WO 98 17811 A (CHROMAXOME CORP) 30 April 1998 (1998-04-30) 

D2: ROWE C J ET AL: 'Construction of new vectors for high-level expression in 

actinomycetes' GENE.NL.ELSEVIER BIOMEDICAL PRESS. AMSTERDAM, 

vol. 216, no. 1, August 1998 (1998-08), pages 215-223, XP0041 49299 

ISSN: 0378-1 1 19 cited in the application 
D3: WO 98 49315 A (KOSAN BIOSCIENCES INC ;UNIV LELAND STANFORD 

JUNIOR (US)) 5 November 1998 (1998-11-05) 
D4: WO 96 40968 A (UNIV LELAND STANFORD JUNIOR ;JOHN INNES 

CENTRE (GB)) 19 December 1996 (1996-12-19) 
D5: MCDANIEL R ET AL: 'Multiple genetic modifications of the erythromycin 

polyketide synthase to produce a library of novel unnatural natural products' 

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF 

USA.NATIONAL ACADEMY OF SCIENCE. WASHINGTON, US, vol. 96, no. 

5, March 1999 (1999-03), pages 1846-1851, XP002143433 ISSN: 0027- 

8424 

D6: MUTH G ET AL: 'Mutational analysis of Streptomyces lividans recA gene 
suggests that only mutants with residual activity remain viable.' 
MOLECULAR & GENERAL GENETICS, vol. 255, no. 4, 1997, pages 420- 
428, XP0021 60032 ISSN: 0026-8925 
D7: EP-A-0 841 402 (NAT INST AGROBIO RES) 13 May 1998 (1998-05-13) 
D8: US-A-4 963 487 (SCHIMMEL PAUL R) 1 6 October 1 990 (1 990-1 0-1 6) 
D9: US-A-4 713 337 (JASIN MARIA ET AL) 15 December 1987 (1987-12-15) 

I) The methods of assembling several DNA units according to present claims 1-21 
appear to be novel and to involve an inventive step (Article 33.2 and 3 PCT) in 
view of the prior art documents cited in the International Search Report. 
As well, the methods according to claims 27, 28 and 31 also fulfil the requirements 
of Article 33.2.and 3 PCT since they involve the use of any one of the methods of 
claims 1-21. 
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II) However, the products according to claims 22-26, obtained by the methods of 
claims 1-21 are not different from the same products disclosed in the prior art, 
although said prior art products have been obtained by other methods. 

Thus, the disclosures of D1 (see the claims, fig. 5E and example 5.5.5), D2 (see 
abstract), D3, D4 and D5 (see abstract) are novelty destroying for the subject- 
matter of present claims 22-26 (Article 33.2 PCT). 

The same reasoning applies mutatis mutandis for the subject-matter of present 
claims 29 and 30 which also lack novelty over D1-D5 (Article 33.2 PCT). 

III) The method according to claim 44 (i.e. integration of expression plasmids into a 
host using a mutated internal fragment of the recA gene as the region for 
homologous recombination) was known per se in the art as shown by documents 
D6-D9 (see the passages mentioned in the ISR). Claim 44 indicates that the host 
is transformed with " one or more synthetic DNA assemblies encoding enzyme 
domains ". Any DNA sequence encoding an enzyme falls under this definition. 
Thus, D6-D9 which disclose said method with plasmids containing sequences 
encoding other enzymes (for instance the reg sequence of D6, see fig. 2A) 
destroy the novelty of claim 44. 

Document D6 discloses said method to transform a Streptomyces host. Thus, D6 
also anticipates claim 45 (Article 33.2 PCT). 

IV) Claim 48 lacks novelty and inventive step under Article 33.2 and 3 PCT, for 
obvious reasons. 

V) 1) In the light of the prior art, the problem underlying part of the present application 

(claims 1-21) can be defined as the provision of further alternative methods of 
assembling several DNA units in sequence. Two methods (claims 1-8 and claims 
9-15) involving an inventive step offer two alternative solutions to this problem. 
However, the libraries and modules according to claims 32-36 and 37-41 , 
respectively merely consist in intermediate products which may or may not be 
used in the above mentioned methods. These products per se do not provide any 
solution to the underlying problem and therefore cannot be considered as 
involving an inventive step (Article 33.3 PCT). 
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2) Claims 42, 43, 46, 47 do not contain any features which, in combination with the 
features of any claim to which they refer, meet the requirements of the PCT in 
respect of inventive step (Article 33.3 PCT). 

Item VIII. 

The passage "(see ... paper)" in claim 16 is not allowable under Article 6 PCT. 
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^ (54) Title: DNA MANIPULATION METHODS AND APPLICATIONS FOR SYNTHETIC ENZYMES 

00 (57) Abstract: The invention comprises a method of assembling several DNA units in sequence in a DNA construct and all deriva- 
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tives of this method. In particular the production of synthetic enzymes is contemplated. Each DNA unit is provided with the same 
restriction enzyme recognition site at its 5* and 3' ends. The restriction recognition site at its 3' end being combined with a recogni- 
tion site for a DNA modification enzyme. A DNA construct having the same or a compatible accessible restriction site, as provided 
in the DNA unit, is cleaved at the restriction site by the appropriate restriction enzyme. The desired DNA unit is then inserted into 
the DNA construct, this ligated product subsequently being brought into contact with a DNA modification enzyme such that the 
restriction site at the 3' end of the inserted DNA unit is abolished. The ligated product is men cleaved at the remaining unmodified 
restriction recognition site and a subsequent DNA unit is inserted. This process is repeated introducing each desired DNA unit to 
give a DNA construct containing all the desired units in sequence. 



WO 00/77181 



- 1 - 



PCT/GBOO/02286 



DNA MANIPULATION METHODS AND APPLICATIONS FOR 
SYNTHETIC ENZYMES. 

5 Background 

Polyketides, including the valuable drugs avermectin, 
erythromycin and rapamycin, are natural products that are synthesised by 
stepwise condensation of acetate, propionate and occasionally butyrate 
units. The enzymes that take part in the biosynthesis of polyketide chains 

10 are collectively known as the polyketide synthase (PKS). PKSs include 
examples of both type I (multifunctional enzyme) and type II (dissociable 
complex) organisation. The sequencing of the gene clusters encoding the 
erythromycin- (ery) and rapamycin- (rap) producing polyketide synthases 
has shown that each cycle of polyketide chain extension is catalysed by a 

15 different set or 'module' of enzyme activities, housed in a few very large 
multienzyme polypeptides. The basic building blocks of modules are 
enzymatic 'domains' that are covalently linked together. The ability of these 
domains to act upon the carbon chain and remove/add functionalities is 
reminiscent of a molecule being acted upon by chemical reagents in a 

20 chemical synthesis. The aim is therefore to assemble these domains or 
even modules in a manner as desired, so that the linked enzymes can 
carry out efficient synthesis of any target molecule. Until now, it has 
however not been possible to find a versatile methodology to assemble 
these PKS units. 

25 The whole area of polyketide research is at a stage where the 

flexibility of the whole enzymatic machinery is understood, despite the lack 
of any X-ray crystal structure data on these giant enzymes, but it remains 
difficult to "re-assemble" the enzymes de novo. A de novo synthesis is 
desirable for two reasons. Firstly, one does not need to change the 

30 structure of, for example, an antibiotic using tedious chemical 

methodologies that are time-consuming and expensive. Engineering an 
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synthetic enzyme at the genetic level is much easier, faster and cheaper. 
As more and more antibiotics are rendered useless, simply because the 
bacteria they were active against have developed ways in which to become 
resistant to these drugs, there is an urgency to keep developing altered 

5 drug structures. Secondly, there is an ever-growing need for new drugs, 
more potent in their action than their predecessors. Whilst nature provides 
a large proportion of the new molecules that are, for example, antibiotic, 
anticholesterol, antifungal, or anti-cancer, the complicated structures of 
these drugs (for example the anti-cancer Taxol) makes it increasingly 

10 difficult for chemists to carry out conventional syntheses. The problem is 
made more difficult by the fact that the genes that make these drugs cannot 
always be isolated. 

The isolation of the genes coding for the proteins that make the 
highly potent anti-cancer compound Taxol, has not as yet been reported. 

is The resulting choice for obtaining Taxol is either to cut down 200 Pacific 
Yew trees to obtain enough taxol for one chemotherapy session, or to 
make the drug chemically using one of the many exceedingly expensive 
and long chemical routes that have appeared recently in the literature. 

With the isolation, cloning and sequencing of the genes coding for 

20 the erythromycin polyketide synthases, a model for the functioning of 
modular type I PKSs began to emerge. It was clear that such a system is 
genetically programmed to carry out the necessary catalytic activities 
needed for processing of the polyketide chain. It is hypothesised that each 
domain acts independently on the progressing carbon skeleton and there is 

25 a correlation between the structure of the growing chain and the enzymatic 
activities carried out by the enzymes. 

The first conclusive proof of such an arrangement came from 
experiments done by Donadio etai (1991, 1993). One such experiment 
(1991) involved an in-frame deletion in the ORF3 segment of erythromycin 

30 chromosome. This deletion eliminated the entire 1 83 amino acids of the 
ketoreductase domain of ery PKS module 5, along with some of the 
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flanking region (a total of 271 amino acids) and resulted in the production of 
5,6-dideoxy-3-a-mycarosyl-5-oxo-erythronolide B, the structure of which 
was confirmed by X-ray crystallography. Replacement of two amino acids 
in the putative NAD(P)H-binding motif of the enoylreductase domain 

5 encoded by ORF2 resulted in a new macrolide A 6,7 -anhydroerythromycin C 
being produced albeit in low yield. These results demonstrated that 
erythromycin PKS can be genetically reprogrammed to produce novel 
macrolides that would otherwise be difficult to get via chemical means. 
During the analysis of the fermentation products produced by a 

io strain of S. erythraea that was genetically engineered to produce an 

analogue of 6dEB, it was found that a minor component of the fermentation 
was 3,5-dihydroxy-2,4-dimethyl-n-heptanoic acid 5-lactone (Donadio etaL, 
1991). This product was predicted to result from premature release of the 
chain from either the ACP of module 2 or the KS of module 3. A greater 

is yield of this triketide product was obtained by heterologous over-expression 
of ORF1 in Streptomyces coelicolor (Kao et a/., 1994), which also showed 
that DEBS1 can function autonomously. More recently (Cortes et aL, 1995), 
a six-membered lactone was produced through genetically engineering the 
PKS. By repositioning the TE (cyclase) domain from module 6 to the C- 

20 terminus of module 2 (end of DEBS1), it was found that the yield of the 
lactone is increased by five-fold to 10-15 mg/L as compared to 1-3 mg/L 
obtained by Kao et al. 

The relocation of the thioesterase domain at the end of DEBS1 was 
the first example demonstrating the efficacy of repositioning domains in 

25 type I modular systems. Since then, numerous such experiments have 
been carried out in order to probe further the efficacy of these 
multienzymes. The TE domain has been relocated at the end of module 5 
as well as module 3 respectively (Kao et a/., 1995, 1996). In both cases, 
the predicted compounds were produced that resulted from truncation of 

30 the progressing polyketide chain. Release of the 12-membered product in 
the former case showed that the thioesterase domain can indeed catalyse 
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ring closure even for less energetically favourable reactions. In the second 
experiment, two products were produced, one of them thought to be 
resulting from spontaneous decarboxylation. 

The first example of a chimaeric polyketide synthase constructed 

5 from a domain taken from a second PKS was demonstrated by Oliynyk et 
al (1996). An acyltransferase domain (AT) from module 2 of the rapamycin 
polyketide synthase was used to replace the AT of module 1 in the DEBS1- 
TE system. The resulting triketide lactone had a methyl group missing at 
position 5 of the six-membered ring. This was expected since the AT of 

io module 2 of rap PKS (unlike the AT of module 1 of DEBS1) incorporates a 
malonyl-CoA extender unit, instead of a methylmalonyl-CoA unit. 

Thus, it has been shown that not only can domains residing within a 
particular PKS be interchanged or destroyed, analogous domains can be 
derived from other synthases for the same purpose or for achieving the 

15 required synthetic goal. Such a strategy immediately provides a glimpse of 
the manner in which "designer" polyketides can be constructed through 
using "off-the-shelf gene products. 

More recently, another hybrid system has been constructed 
(Marsden et al., 1998) wherein a complete loading module from the 

20 avermectin PKS has been swapped with the erythromycin loading module, 
while keeping the rest of the DEBS modules intact. As expected, 
incorporation of butyryl-CoA as well as 2-methylisobutyryl-CoA was seen 
and in both cases, the end products contained the above mentioned 
residues. A closely-related experiment has been reported by Kuhstoss et 

25 al. (1996) in which the loading module from the platenolide PKS was 

replaced with the loading module from tylactone PKS to yield the expected 
polyketide product. 

It is very clear from the various engineering efforts outlined above 
that the aim must now be to exploit the potential for genetic manipulation of 

30 type I (modular) polyketide synthases (PKS) to produce hybrid synthases 
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that might catalyse the formation of novel secondary metabolites in a 
predictable way. 

What might be a giant step towards the realisation of this aim, would 
be to investigate whether these enzymes might be constructed de novo, as 

5 an essential step in developing a truly combinatorial biosynthesis of 
complex polyketides. 

The 'assembly line' nature of type I polyketide synthases (PKS) that 
contain sets (called modules) of structurally similar but functionally different 
enzymatic activities (domains) suggests their potential as a source of "off- 

10 the-shelf enzymatic reagents which can be used to synthesise new and 
complex polyketide molecules. Outlined below are methodologies for the 
rapid assembly of DNA units encoding such enzyme domains or modules 
of enzyme domains. 

There are over 40 gene sequences for polyketides that are available 

is from various databases. In addition there are numerous domains known 
from other synthetic enzymes such as, for example, fatty acid synthase 
(Joshi and Smith, 1993), peptide synthetases (Eisner etaL, 1997) and 
hybrid polyketide/peptide synthesising enzymes (Paitan etal., 1999; Shen 
et aL, 1999). This amounts to a vast library of domains and modules that 

20 cater for a chemical reaction (e.g. stereospecific condensation, 

dehydration, etc), or in the case of a module, a set of chemical reactions. 
In order to obtain analogues of a bio-active molecule, research efforts till 
now have been focused on strategies that involve either chromosomally 
altering the PKS genes that make the particular molecule (McDaniel et al y 

25 1999) or feeding synthetic intermediates to the PKS (Jacobsen et a/., 1997) 
Because of the simplified nature of such experiments, these strategies will 
remain a fast route towards obtaining a wide variety of drug analogues. 
However, in the case of compounds like the highly potent anti-cancer 
discodermolide (TerHaar et a/., 1996) the only possible means of obtaining 

30 sufficient quantities of the drug is through chemical synthesis. This is 

because in such cases, the genes responsible for making these bio-active 
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molecules have not been isolated. The chemical synthesis of large 
molecules having numerous chiral centres like for example discodermolide, 
howsoever elegant, is tedious and expensive to scale-up (Marshall and 
Johns, 1998). 

5 

Abbreviations 

In addition to those listed in Biochem. J. (1986) 233, 1-24, the following 
abbreviations have been used: 



10 


6-dEB 


6-deoxyerythronolide B 




6-MSA 


6-methylsalicylic acid 




6-MSAS 


6-methylsalicylic acid synthase 




ACP 


acyi carrier protein 




AT 


p-keto acyl transferase 


15 


bp 


base pair(s) of DNA 




DEBS 


6-deoxyerythronolide B synthase 




DH 


P-hydroxyacyl-ACP dehydratase (dehydratase) 




ER 


enoyl reductase 




FAS 


fatty acid synthase 


20 


kbp 


kilobase pair(s) 




KR 


p-ketoacyl reductase 




KS 


P-ketoacyl synthase 




ORF 


open reading frame 




PKS 


polyketide synthase 


25 


RAPS 


rapamycin synthase 




TE 


thioesterase 
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The Invention 

In one aspect the invention provides a method of assembling several 
DNA units in sequence in a DNA construct. This method comprises the 
5 steps of: 

a) providing each DNA unit with a restriction enzyme recognition sequence 
at it's 5' end and with a recognition sequence for the same restriction 
enzyme at its 3' end that is combined with a recognition site for a DNA 
modification enzyme, 
io b) providing a starting DNA construct having an accessible restriction site 
for the same or a compatible restriction enzyme and cleaving the starting 
DNA construct with such a restriction enzyme, 

c) inserting the desired DNA unit and bringing the ligated product into 
contact with a DNA modification enzyme such that the restriction site at the 

15 3' end of the inserted DNA unit is abolished, 

d) cleaving the ligated product at an accessible unmodified recognition site 
for the same or a compatible restriction enzyme, 

e) repeating steps c) and d) to introduce each desired DNA unit to give a 
DNA construct containing all the desired units in sequence. 

20 DNA units can be any desired DNA sequence, though usually they 

encode enzyme domains or modules of two or more enzyme domains. The 
recognition sequences are usually positioned at the ends of the DNA unit 
once the DNA unit has been cut with the relevant enzyme, by this it is 
meant that the recognition sequences are adjacent to the coding sequence, 

25 or that they flank the said sequence. An accessible restriction site is herein 
defined as a restriction site which is unmodified, such that it can be cleaved 
by a restriction enzyme that normally recognises the sequence of the site. 
The accessible restriction site is preferably a unique site in the DNA unit or 
ligated product. Where there is more than one accessible site present, it is 

30 possible to perform a partial digest, as known in the art, to obtain digested 
products in which only the required site is cleaved in the DNA unit. The 
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DNA modification enzyme employed in the method can be a methylase for 

example the dam methylase of Escherichia coli. Other methylases such as 

dcm are also envisaged. 

A particular method comprises the steps of 
5 a) providing each DNA unit with an Xba\ recognition sequence 

S'XXTCTAGAS' (where XX is not GA) at it's 5' end and with an Xba\ 

recognition sequence 5'GATCTAGA3' at its 3' end. 

b) providing a starting DNA construct having an accessible Xba\ site and 

cleaving the starting DNA construct with Xbal, 
10 c) inserting the desired DNA unit and using a resulting ligated product to 

transform a dam+ strain of E. coli, 

d) recovering a resulting plasmid and cleaving the plasmid at an accessible 
Xbal site with Xfcal, 

e) repeating steps c) and d) to introduce each desired DNA unit to give a 
15 DNA construct containing all the desired units in sequence. 

The recognition sequences for the restriction enzyme and the DNA 
modification enzyme employed in the method can be created in the DNA 
units prior to cutting with the restriction enzyme, for example by means of a 
primer extension reaction. The preferred DNA construct made by the 

20 method can be an expression vector capable of facilitating expression of 
the protein encoded by the desired DNA units. 

It is also envisaged that the DNA modification can be removed and 
the restriction site re-established by replicating the ligated product in a 
dam- strain of E. coli by means of suitable vectors as known in the art. 

25 The invention also encompasses DNA unit assemblies where any 

given restriction enzyme recognition site can be modified by addition of a 
certain combination of nucleotide bases in order for it to be protected. 

In a further aspect, the invention provides a method of making an 
assembly of several DNA units in sequence which method comprises the 

30 steps of: 
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a) providing a first DNA unit with a recognition sequence for a first 
restriction enzyme at its 3' end, and cleaving the said first DNA unit with 
said first restriction enzyme, 

b) providing each other DNA unit with a recognition sequence at its 5' end 
5 for a second restriction enzyme which has a compatible ligation sequence 

with that of the first restriction enzyme, and an upstream recognition 
sequence for said first restriction enzyme and a downstream recognition 
sequence for a third restriction enzyme at its 3' end, and cleaving each said 
other DNA unit with the second and third restriction enzymes, 
10 c) ligating the said first DNA unit with a desired other DNA unit to form a 
ligated product such that the ligation of the two units abolishes the 
recognition site for the first restriction enzyme at the ligation junction, and 
cleaving the ligated product with said first restriction enzyme, 

d) ligating the product from c) with a desired DNA unit from b) to form a 
15 ligated product and cleaving the ligated product with said first restriction 

enzyme 

e) repeating step d) with each other DNA unit in turn so as to assemble the 
DNA units in sequence. 

A particular method comprises the steps of: 
20 a) providing a first DNA unit with an Xba\ recognition sequence 

5TCTAGA3 1 at its 3' end, and cleaving the said first DNA unit with Xba\ % 

b) providing each other DNA unit with a Spe\ recognition sequence 
5'ACTAGT3' at its 5' end, and a downstream Xba\ recognition sequence 
5TCTAGA3' followed by a downstream Smal recognition sequence 

25 S'CCCGGGS' at its 3' end, cleaving each said other DNA unit with Spel and 
Smal, and dephosphorylating the 5' end of the cleaved DNA unit, 

c) ligating the said first DNA unit with a desired other DNA unit to form a 
ligated product and cleaving the ligated product with Xbal, 

d) ligating the product from c) with a desired DNA unit from b) to form a 
30 ligated product and cleaving the ligated product with Xba\ 
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e) repeating step d) with each other DNA unit in turn so as to assemble the 
DNA units in sequence. 

in one embodiment the assembly can occur via stepwise addition of 
fragments to a vector. 

5 In an alternative embodiment the first DNA unit can be attached to 

the solid phase for use in step c). This permits the solid phase to be split 
and mixed between steps c), d), and e) to make several different 
assemblies. Methods of attaching DNA units to the solid phase are well 
know in the art. Preferred solid phase elements are beads attached to the 

io DNA units via a biotinylated nucleotide, as known in the art. 

The recognition sequences in one or more of the DNA units are 
preferably introduced by means of extension primers, as known in the art, 
though other methods such as the ligation of the required sequences or in 
vitro mutagenesis can also be employed. 

15 The assembly of several DNA units can be inserted into an 

expression vector and thus used to transform a host capable of expressing 
the protein encoded by the insert of the vector. 

The method is particularly useful where one or more of the DNA 
units encodes a catalytic or transport protein domain for example a 

20 ketoreductase domain from a PKS enzyme or an ACP domain from a 
hybrid polyketide/peptide synthesising enzyme. Such domains can be 
derived from enzyme domain DNA sequences from, for example, 
polyketide synthesising enzymes, peptide synthesising enzymes, hybrid 
peptide polyketide synthesising enzymes, fatty acid synthesising enzymes 

25 or other enzyme domains known in the art. 

The DNA units used in the methods of the invention can encode 
modules comprising one or more catalytic or transport domains. Usually a 
module contains all of the domains required to complete one condensation 
step in the synthesis of a target molecule. 

30 Alternative aspects of the invention resulting from the methods of the 

invention include: DNA constructs or vectors incorporating a DNA assembly 
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encoding synthetic enzymes, synthetic enzymes encoded by such DNA 
assemblies, hosts expressing synthetic enzymes, hybrids of transformed 
hosts expressing synthetic enzymes, and compounds produced by the 
synthetic enzymes. 

5 Where the product produced by the synthetic enzyme exhibits toxicity 

to a host stain, this can be worked around e.g. by means of choosing a 
different strain or mutating the original strain to provide mutants which are 
more tolerant. The diversity of compounds produced by hosts transformed 
with the synthetic enzymes of the invention can be further increased by 

10 using known methods of using different feedstocks in the fermentation to 
provide different starter units for the desired product. Where yield of 
desired synthetic enzyme product is low, routine steps e.g. mutation and 
selection, can be taken to improve this, 

The synthetic enzymes of the invention can also be used in cell-free 

is systems to produce the desired target molecule in vitro as known in the art, 
for example, see Carreras and Khosla (1998). 

In a further aspect, the invention provides a method of synthesising 
a target molecule comprising the steps of 

a) examining the composition and stereochemistry of a target molecule, 
20 b) determining which catalytic and transport domains need to be present in 
a synthetic enzyme in order to catalyse the synthesis of the target 
molecule, 

c) using any one of the methods of the invention to assemble the required 
DNA units encoding the catalytic and transport domains into a DNA 

25 assembly that encodes said synthetic enzyme which is capable of 
synthesising the target molecule. 

d) placing the DNA assembly into a vector to allow expression of the 
synthetic enzyme in a host capable of synthesising the target molecule 
after transformation with said vector. 

30 Target molecules are generally bio-active molecules, usually having 

a predominantly carbon based backbone and usually are macromolecules 
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comprised of condensed units. The transformed host can be tested for the 
presence of the target molecule after step d). If yields of the desired 
compound are low then conventional methods of improving product yield 
from, for example Streptomycetes, can be employed. Transformed hosts 
5 which result from the methods of the invention and their use in producing 
target molecules are also aspects of the invention. Hosts suitable for 
transformation with the DNA assemblies of the invention are known in the 
art and include insect or mammalian cells, though more usually suitable 
are bacterial cells, for example, the improved host strains described by 
10 Ziermann and Betlach (1999). 

As stated previously, it is also envisaged that the synthetic enzyme 
can be used in a cell-free system to produce the target molecule in vitro. 

A further aspect of the invention is a method of making a synthetic 
15 enzyme to catalyse the synthesis of a target molecule comprising the steps 
of 

a) examining the composition and stereochemistry of a target molecule, 

b) determining which catalytic and transport domains need to be present in 
the synthetic enzyme in order to catalyse synthesis of the target molecule, 

20 c) using any one of the methods of the invention to assemble the required 
DNA units encoding the catalytic and transport domains into a DNA 
assembly that encodes an enzyme which is capable of synthesising the 
target molecule. 

d) expressing the DNA assembly in a suitable host to produce the enzyme. 
25 In a further aspect the invention provides a library of DNA units 

encoding catalytic or transport protein domains, wherein each DNA unit has 
a recognition sequence for a restriction enzyme at it's 5'-end and a second 
recognition sequence for the same or a compatible enzyme at it's 3-end 
which incorporates a recognition sequence for a DNA modifying enzyme. 
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In a particular embodiment of such a library, each DNA unit has an Xbal 
recognition sequence 5'XXTCTAGA3 l (where XX is not GA) at it's 5'-end 
and an Xbal recognition sequence 5'GATCTAGA3' at it's 3'-end 

Also provided by the invention is a library of DNA units encoding 

5 catalytic or transport protein domains, wherein each DNA unit has a 
recognition sequence at its 5' end for a first restriction enzyme, and a 
downstream recognition sequence for a second restriction enzyme followed 
by a downstream recognition sequence for a third restriction enzyme at its 
3' end, such that the DNA units, once restricted by the first and second 

10 restriction enzymes can be ligated together to abolish the restriction sites at 
the ligation junction. In one embodiment of this aspect of the invention 
each DNA unit has a Spel recognition sequence 5'ACTAGT3' at its 5'-end, 
and a downstream Xbal recognition sequence 5TCTAGA3* followed by a 
downstream Sma\ recognition sequence 5'CCCGGG3' at it's 3'-end 

is Catalytic or transport protein domains can be derived from any 

enzyme, for example those listed above. Particularly envisaged are 
libraries in which the DNA units encode polyketide synthetic domains, 
comprising two KS domains, at least two AT domains, two KR domains, 
two DH domains, two ER domains, an ACP domain and a TE domain. 

20 Also provided by the invention are modules comprising a DNA 

sequence encoding a functional set of polyketide synthetic domains 
wherein the module has a recognition sequence for a restriction enzyme at 
it's 5-end and a second recognition sequence for the same or a compatible 
enzyme at it's 3'-end which incorporates a recognition sequence for a DNA 

25 modifying enzyme. An envisaged module has an Xbal recognition 

sequence 5'XXTCTAGA3' (where XX is not GA) at it's 5-end and an Xbal 
recognition sequence 5'GATCTAGA3' at it's 3'-end 

Alternatively a module comprising a DNA sequence encoding a 
functional set of polyketide synthetic domains can have a recognition 

30 sequence at its 5' end for a first restriction enzyme, and a downstream 
recognition sequence for a second restriction enzyme followed by a 
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downstream recognition sequence for a third restriction enzyme at its 3' 
end, such that the DNA units, once restricted by the first and second 
restriction enzymes can be ligated together to abolish the restriction sites at 
the ligation junction. In one particular example, the module has a Spel 

5 recognition sequence S'ACTAGTS 1 at its 5'-end, and an upstream Xba\ 
recognition sequence 5TCTAGA3' and a downstream Sma\ recognition 
sequence 5'CCCGGG3' at it's 3-end. 

Particularly envisaged are modules wherein the DNA units encode 
polyketide synthetic domains, comprising two KS domains, at least two AT 

10 domains, two KR domains, two DH domains, two ER domains, an ACP 
domain and a TE domain. It is also envisaged that other non-polyketide 
enzyme domains can be included in the modules provided by the invention. 

Also provided by the invention are vectors containing one or more 
modules. Particularly useful are vectors in which a non-functional recA 

is gene is also present. Such vectors prevent unwanted homologous 
recombination occurring between domains within the vector upon 
integration into a suitable host by abolishing the recA gene activity in that 
host. Thus the invention also provides a method of transforming a host with 
one or more synthetic DNA assemblies encoding enzyme domains which 

20 comprises the steps of: 

a) Inserting said DNA assembly into a vector containing a mutated internal 
fragment of a recA gene sequence such that the vector is capable of 
undergoing homologous recombination with the recA gene of the host, 

b) bringing said vector into contact with a host chromosome under 
25 conditions which permit homologous recombination to take place, 

c) disrupting the host recA gene by the integration of the DNA of said 
vector into the chromosome. The expression vector can be used to 
transform a Steptomyces host. The DNA assemblies contained in the 
vector can be modules as described herein. 
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Also envisaged are transformed hosts which prior to transformation 
with a vector containing one or more modules according to the invention, 
were already lacking a recA function. 

In a further aspect the invention provides kits containing DNA units, 

5 DNA modules, vectors, DNA manipulation hosts, DNA modification hosts, 
expression hosts, or solid phase elements for use in the methods of the 
invention. For example, one such kit might contain a first DNA unit which is 
a vector suitable for transforming a suitable host, a library of modules for 
insertion into that vector, both the first DNA unit and the library having the 

io necessary recognition sites for use in the methods of the invention, 

together with host strains suitable for the manipulation and expression of 
the DNA assemblies of the invention. 

A de novo "domain-by-domain" reconstruction of a hybrid 
multienzyme from the erythromycin-producing PKS has been achieved by 

15 the inventors by assembling DNA units corresponding to the constituent 
domains. The assembled gene was expressed in S. erythraea and the 
expected compounds were isolated from the bacterial broth. Application of 
this methodology, or variations of this methodology for making 
combinatorial assemblies of complex and aromatic PKSs allows for the 

20 rapid generation of novel or altered PKS or other synthetic multienzymes 
and paves the way for a quick and inexpensive synthesis of potentially bio- 
active molecules. 

One alternative to chemical syntheses is to carry out a 
'retrobiosynthetic analysis' of the desired molecule, by pinpointing the exact 

25 number and type of synthetic enzyme domains that are required for every 
chemical step, and then assembling the DNA units that encode these 
enzymes in order to make a hybrid synthetic enzyme. The aim is therefore, 
to assemble these domains or even modules in a manner as desired, so 
that the linked enzymes can carry out a progressive synthesis of a desired 

30 target molecule. Until now, it has not been possible to find a methodology 
to assemble these PKS DNA units using restriction enzymes and DNA 
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ligase to cut and join the DNA pieces together - one of the limiting factors 
being the non-availability of appropriate restriction enzyme sites in the DNA 
sequence of the enzymes which synthesise these polyketide drugs. There 
exist very few unique restriction enzyme sites and even fewer restriction 

5 enzymes that do not cut in the polyketide DNA sequence (i.e. are "non- 
cutters"). However, the restriction enzyme Xba\, because of itsTA-rich 
recognition sequence (5TCTAGA3'), does not cleave the majority of GC- 
rich polyketide gene clusters. Thus, flanking both ends of the DNA of the 
desired DNA unit (domain or module) with a recognition sequence that is 

io cleaved on one end by Xba\, and on the other end by a restriction enzyme 
that is compatible with Xba\ (e.g. Spel) is possible. A vectorial assembly, 
where such units are progressively joined, leaves one end of the unit that 
has been constructed by the ligation of Xba\ and Spel-cut DNA ends, not 
recognisable by either of the two enzymes, thus making further addition of 

15 units possible at only one of the two ends. 

This strategy makes use of selective recognition of the restriction 
enzyme site by the restriction enzyme Xbal, depending upon the sequence 
adjacent to the restriction enzyme site and upon the strain used (dam + or 
dam") during the assembly process. The method has been shown to be 

20 successful, and by using this methodology to assemble modules, the 
complete erythromycin-producing PKS (comprising of six modules coded 
by three large open reading frames) can be built in under 10 days. Even 
though this time-period is small compared to what it would take to 
assemble the ery PKS genes using conventional methodologies, using a 

25 variation of the above mentioned methodology, complete gene-clusters, 
like the 33 kbp erythromycin PKS, can be built within a matter of hours. 

Also described herein, is an approach wherein the assembly of the 
units itself can also be carried out in vitro without the need for an in vivo 
DNA modification step. Furthermore, employing the in vitro assembly 

30 methodology described below, one is now able to not only construct 

predetermined PKS genes, but also a randomly constructed combinatorial 
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library of shuffled domains from one or more known synthetic enzymes. 
This has immediate and important implications for drug-discovery. 

The methodology thus outlined requires DNA units to be modified so 
that they contain the appropriate 5'and 3' ends (X and X d respectively). 

5 These units are then progressively assembled to achieve the desired gene 
length. The vector containing the assembled or reconstructed gene is then 
used to transform an expression system to achieve protein expression. 
This methodology has been shown to work effectively - the hybrid 
multienzyme DEBS1-TE was reconstructed by assembling de novo the ten 

10 constituent domains. The assembled gene, when expressed in S. erythraea 
gave the expected six-membered triketide lactones. 

However, in the case of larger molecules like discodermolide, one 
would require a vectorial assembly of some 50 or so PKS units (if 
domains). A hypothetical PKS that would make a molecule as large as 

15 discodermolide would require 12 modules, each possessing the 

appropriate KS, AT, ACP and a set of reductive domains (e.g. KR, DH or 
ER). One would find that some of the domains in this group of 50 would be 
required to carry out the same catalytic function. For example, if all the 
hydroxy groups resulting from the ketoreductase activity from all 12 

20 modules are of the same configuration, in effect 12 KRs that function in an 
identical fashion are required. Also, all 12 ACPs would, of course have the 
same catalytic function. It would therefore logically be more convenient, 
and less time-consuming if, to achieve ketoreduction from every one of the 
12 modules, one used only one KR domain instead of 12 different ones in 

25 all the modules, or one ACP instead of 12 different ACPs. In fact, one can 
calculate that for every possible chemical reaction that can be carried out 
using PKS domains, one requires a set of only 12 domains, that in theory 
can be used repeatedly (Figure 1). 

It is possible that inter-modular recombination events within the 

30 reconstituted PKS or other synthetic enzyme gene, may preclude the use 
of identical PKS or other enzyme domain DNA units in a set of modules. It 
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might be expected that, for example (Figure 2) the ACP* DNA in module 1 to 
recombine with the identical ACP* DNA in module 3. This event can take 
place, for example, when the expression vector that possesses the 
assembled gene containing numerous identical PKS DNA units is used to 
5 transform a streptomyces host for polyketide production. 

The inventors have developed a strategy that can circumvent this 
problem, therefore making it possible to construct large synthetic enzyme 
gene clusters using identical domains or modules repeatedly. This translates 
into a less expensive route towards synthetic enzyme gene construction (one 
10 would not require to have a start-up library of 200 or so to cover all 
possibilities), as the set of 12 domains, or similar functional arrangements of 
domains, are true "off-the-shelf components for the assembly of PKS genes 
or genes for other hybrid synthetic enzymes. 

The inventors provide methods of DNA assembly that pave the way for a 
15 cheap and fast synthesis of a host of bio-active molecules, e.g. the anti- 
cancer drug Discodermolide. 

The examples that follow are better described with reference to the 
following figures: 

20 Figure 1 shows the chemical/stereochemical choices that each PKS domain 
can make. A total of 12 domains are required for every conceivable 
polyketide reaction. 

Figure 2 shows integration of a plasmid containing more than one identical 
25 DNA unit (ACP*). After the plasmid has integrated in the streptomyces host 
through homologous recombination with TE, internal recombination can occur 
to yield truncated PKS genes. This is because the host is recA\ 

Figures 3A and 3B show a schematic representation of the assembly process. 
30 The de novo construction of DEBS1-TE. DNA fragments (units) encoding for 
the constituent domains of the multienzyme DEBS1-TE were inserted 
sequentially into the expression plasmid pCJR24. 
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The final plasmid pAR10 was then expressed in S. erythraea/JC2 to yield the 
expected triketide lactone products that are synthesised by the schematically 
shown re-assembled DEBS1-TE synthase. The amino acid changes made 
within the linker regions between domains are shown below the actual amino 
5 acid sequence. Construction of the expression plasmid pAR10 and structural 
characterisation of the two triketide lactones shown in the above figure is 
described in the methods section. X - Xbal restriction enzyme recognition 
sequence (5TCTAGA3 , ) I X d - Xbal and Dam methylase recognition 
sequence (5GATCTAGA3) 

10 

Figure 4 shows the methodology of the assembly of DNA units using 
Xbal/dam methylase technology. During the second last stage of assembly, 
indicated as transform and cut in the figure, transformation of a Dam strain 
with plasmid (as it is a danrfstrain, even X d would be cleaved by Xbal) is 
is effected. Cutting is achieved by Xbal and the DNA unit purified on a gel. 

Figure 5 shows the procedure for the assembly of DNA units using Xbal/dam 
methylase technology. 

20 Figure 6 shows how an Xbal site can be made sensitive to methylation. 

The RE cuts at the sites shown by arrows. The boxed sequence is 
methylated in a dam + strain thereby altering the Xbal recognition site. The 
sequence however is not methylated in a danrfstrain, and so can still be 
cleaved by Xba\. The Xbal recognition sequence (5TCTAGA3') can 

25 therefore be selectively cleaved by Xbal. Assembly of DNA units uses only 
one restriction enzyme - Xbal. 

Figure 7 shows the methodology of the in vitro assembly of DNA units - I 
using solid phase beads with the enzymes Xbal, Spel and Smal (other Xbal - 
30 compatible REs may be used). 
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Figures 8 and 9 show how the methodology of the in vitro assembly of DNA 
units - II would proceed to the point of placing the DNA assembly into an 
expression vector for transforming and appropriate host. In vitro assembly of 
DNA units (domains) from the first multienzyme of erythromycin - producing 
5 PKS. 

Figure 10 shows how in one single ligation, 16 ongoing assemblies are 
generated. This cascade can obtain exponential proportions. The gene 
library can be increased by increasing the diversity of the incoming unit. 

10 

Figure 1 1 shows the integration of an expression plasmid into a streptomyces 
host, using a mutated internal fragment of the recA gene as the region for 
homologous recombination. The resulting PKS gene can now contain more 
than one identical DNA units as the strain has been made recA minus. 

15 

Figure 12 shows the assembled PKS recADEBS1-TE. The second module is 
composed of domains that normally belong to the first module. 

Figure 13 shows the amino acid sequence alignment of the recA protein of S. 
20 lividans (S.I.) and S. ambofaciens (S.a). Percent similarity: 96.496, percent 
identity: 95.418. Match display thresholds for the aiignment(s): 
I = identity 
: = 2 
. = 1 

25 

Figures 14A and 14B show a DNA sequence alignment of the recA gene S. 
lividans (S.I) and S. ambofaciens (S.a). Start of the gene is from 'ATG' and 
stop is TGA\ Percent similarity: 94.713, percent identity: 94.713. 

30 Figure 15 shows how an Xbal/Spe\ system might be used instead of an 
Xbal/dam methylase system to assemble DNA units, a strategy involving 
compatible restriction enzymes flanking either end of a DNA unit. An example 
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of compatible REs would be Xba\ and Spel. The recognition sequence of 
Xba\ is - 5TCTAGA3' and that for Spel is 5ACTAGT3'. After Xba\ and Spel 
have cleaved the DNA at their respective sites, the DNA unit can be ligated 
together as the overhanging is complementary. The junction where any two 
5 units are joined is now recognised by either Xba\ or Spel. 

Figure 16 is a schematic representation of the compatibility of X6al- and Spel- 
digested DNA overhangs. It shows the compatibility of the sticky ends 
produced by Xba\ and Spel and how re-ligation abolishes both sites. 

10 Figure 17 shows a schematic representation of the erythromycin-producing 
polyketide synthase; primary organisation of the genes and their 
corresponding protein domains. The multienzymes deoxyerythronolide B 
synthase 1 (DEBS1), DEBS2 and DEBS3 each have two modules, each of 
which processes one cycle of polyketide chain extension. Each of the six 

15 modules is constituted by covalently-linked enzymatic domains. Exploitation 
of such an enzymatic hierarchy as "of-the-shelf reagents can lead to 
synthesis of important chemical compounds. 

Figure 18 shows the structure of the anticancer drug discodermolide (top) and 
20 the 'retrobiosynthetic approach' towards synthesising a target molecule (a 
discodermolide). Such an approach would involve opening up the structure 
(a.), identifying the number and type of polyketide carbon units that would 
make the discodermolide carbon skeleton (b.), and choosing the PKS DNA 
units (modules/domains) responsible for the uptake and subsequent 
25 processing of the carbon units (a). 

Figure 19 shows the anti-tumour compound octalactin and the strategy behind 
the retrobiosynthetic approach towards synthesising bio-active molecules. 
The strategy comprises the steps of: 
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Identify polyketide units - e.g. whether acetate, propionate, etc, 

Break-up and identify - break up the carbon skeleton and identify how many 
such carbon units are present. Eight units would mean one requires eight 
5 modules to make a PKS. 

Choose - choose the modules or domains that would be required, form an 
existing library of such PKS modules and domains. 

10 Assemble - assemble the DNA units (modules/domains/using the invention. 

Express - express the assembled gene in a host and check for compound 
production. 

15 Figure 20 shows a schematic representation of they hypothetical polyketide 
synthase for synthesising octalactin B, assembled from enzyme units that 
belong to various PKSs in the public domain. 

Figure 21 shows a schematic representation of the hypothetical 
20 decarestrictine polyketide synthase for synthesising the anti-cholesterol 
compound decarestrictine J, assembled from enzyme units that belong to 
various PKSs in the public domain. 
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Examples 

Example 1 : Vectorial assembly of DNA units 

DNA units that are to be assembled contain the Xba\ recognition 

5 sequence at either end of the unit. At one of the ends, two nucleotides (GA) 
are arranged at the 5' end of the Xba\ recognition sequence (thus making it 
S'GATCTAGAS'). This is achieved by first incorporating the Xba\ 
recognition sequences in the oligonucleotide primers and then amplifying 
the desired DNA unit by PCR. The PCR products are then ligated to a 

10 pUC-1 8 vector, used to transform a dam + strain of E coli, and the clones 
isolated and sequenced for possible errors in the PCR products. A dam + 
strain of E. coli - like DH10B™ - methylate the nucleotide A in the 
sequence GATCTAGA, as S'GATCS' is a sequence that is recognised by 
the product of the Dam methylase gene (Fujimoto et a/., 1965; Geier etaL, 

is 1979). This makes only one end of the DNA unit cleavable by Xba\. The 
vector is then used to transform a dam" strain of E coli (e.g. ET12567 - 
MacNeil et a/. (1992)) and the plasmid DNA isolated. This DNA is now 
cleavable at both ends of the DNA unit by Xba\. When a library of units has 
been constructed using this strategy, and both ends of these units have 

20 been cleaved by Xba\, they are progressively inserted into a vector that has 
a unique Xba\ site and the ligated products are used always to transform a 
dam* strain of E. coli, thereby making sure that one end of the DNA unit is 
always protected from cleavage by Xba\ through methylation. When the 
assembly of such units is completed, the final plasmid is integrated into a 

25 streptomyces strain for the production of the desired polyketide. 

Using this methodology, the polyketide synthase DEBS1-TE, a 
multienzyme that has the first of the three bimodular erythromycin DEBS 
enzymes (DEBS1), fused with the erythromycin thioesterase (Cortes ef a/., 
1995) was constructed in a de novo fashion. The ten inherent PKS 

30 domains in DEBS1-TE, namely, loading module (itself composed of an AT 
and an ACP), KS1 (ketosynthase of module 1 ), AT1 , KR1 , ACP1 , KS2 



WO 00/77181 



PCT/GB00/02286 



-23- 

(ketosynthase of module 2), AT2, KR2, ACP2 and TE function in 
conjunction to catalyse the synthesis of (2R,3S,4S,5R)-2,4-dimethyl-3,5- 
dihydroxy-n-hexanoic acid 5-lactone (2), figure 3. 

The DNA for all ten domains was amplified by PCR to incorporate 
5 the two aforementioned recognition sequences for Xba\ (5TCTAGA3' and 
5'GATCTAGA3 f ) at the 5' and 3' ends of the DNA unit respectively. The 
PCR products were cloned in pUC18 vector, sequenced, and then used to 
transform the dam" E. coli ET12567 strain. To initiate the assembly 
process, the DNA unit for TE was inserted into S. erythraea expression 

io vector pCJR24 (Rowe et ai„ 1998) which has a unique Xba\ site. This 
vector also contains a thiostrepton-resistance gene as a marker for 
identifying successful integrands. The ligated products were used to 
transform the dam + E. co//DH10B™ strain and the plasmid DNA isolated. 
This plasmid (pAR1) can only be singly cleaved with Xba\, despite 

is possessing two Xbal recognition sequences, as one of the sites (situated at 
the 3* end of the TE unit) has been methylated by the E. coli Dam 
methylase. The next DNA unit (ACP2 from module 2 of DEBS1) was then 
ligated to the Xbal-cut pAR1, the ligation mixture used to transform DH10B 
cells and the plasmid DNA isolated. Likewise, the other eight DNA units 

20 were successively added to pAR1 to finally yield the expression plasmid 
pAR10 containing the reconstituted DEBS1-TE gene (Figure 3). The 
junctions where these domains were joined were chosen in the linker 
regions that lie between these domains, so as to cause minimum 
disturbance of the structural features of these domains, that might in turn 

25 affect the proficiency of the domains themselves (Figure 3). Plasmid pAR1 0 
was then used to transform S. erythraea/JC2 - a mutant strain of the wild- 
type S. erythraea NRRL2338 that lacks the DEBS genes except for the TE 
DNA fragment (Rowe et ai, 1998). Thiostrepton-resistant colonies were 
selected upon integration of the vector into the S. erythraea chromosome. 

30 Single transformants were grown on selective media, as described in the 
methods section. The fermentation broth was extracted with ethyl acetate 
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and a sample of the organic extract was analysed by gas chromatography- 
mass spectroscopy (GC-MS). Two peaks were observed, correspond^ to 
molecular massess 158 and 172, indicating the presence of the expected 
acetate- and propionate- derived polyketides (2R,3S,4S,5R)-2,4-dimethyl- 
3 5-dihydroxy-n-pentanoic acid d-lactone (1) and (2R,3S,4S,5R)-2,4- 
dimethyl-3,5-dihydroxy-n-hexanoic acid d-lactone (2). Both compounds 
were isolated and fully characterised by high-pressure liquid 
chromatography (HPLC), 1 H 1D and 2D NMR, 13 C NMR, FT-ICR. 
spectrometry, and by comparison with a synthetic standard of (2) (Brown et 
al 1995) One litre of fermentation broth produces 24 mg of (1) and 56 mg 
of (2) - yields that are comparable to those reported elsewhere (Lau et al., 
1 999) It can therefore be asserted that the ten newly constructed rnter- 
domain junctions have not in any way dimmed the catalytic proficiency of 

the DEBS1-TE synthase. 

In the absence of any crystal-structure data on PKS domains, all 
genetic engineering efforts known in the art have been based on trial-and- 
error methods of experimenting with where to join two such doma.ns. As a 
result the yield of the synthesised polyketide products have varred 
depending upon the position in the polypeptide chain at which the doma.ns 
or modules have been linked (McDaniel et al., 1999; Ruan eta/.. 1997). 
The successful functioning of the reconstructed polyketide synthase 
described above has supplied new information about the inter-domarn 
junction sites. Using this information, and the described methodology for 
the rapid assembly of these enzyme units, it is now possible to carry out a 
5 Tetrobiosynthetic analysis' of target molecules and then to use polyket.de 
and other biosynthetic enzyme domains as truly 'off-the-shelf reagents to 
achieve a stereospecific synthesis. There is also the possibility of us.ng th.s 
methodology for randomly combining DNA units that encode catalytic e.g. 
DH or transport e.g. ACP protein domains to generate comb.natonal 
J0 libraries of hybrid synthases. By using a suitable assay system to test for 
biological activity of the compounds that are generated by such means, rt ,s 
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possible to go back and isolate the hybrid synthetic gene resposible for the 
production of these compounds. 

From 6-methylsalicilic acid to maitotoxin, nature displays a 
staggering diversity in compounds that are synthesised by means of 
5 Combinatorial gene-shuffling'. This methodology, or variations of this 
methodology can be used as effective tools towards harnessing the 
combinatorial potential of discrete enzymatic units or their sets that are the 
feature of multifunctional PKS and other systems. 

A similar system to the XbaMdam system described above, uses the 
10 restriction enzyme Fo/cl which has the recognition site: 

5 , GGATG(N) 9 i3 1 
^CCTACMiaTS 1 

with the dcm methylase of E.coll Adding CCA or CCT to the 5' end of the 
Fo/cl recognition site would make the site dcm sensitive. Furthermore, if 
15 the sequence TCTAGA were inserted into the redundant section of the Fok\ 
restriction site, then the enzyme could be used to generate 'Xbal-cut ends'. 
Methods 

E. coli dam + DH10B™ strain was purchased from Gibco BRL, USA.. 
Pfu DNA polymerase was purchased from Boeringer, Germany. 

20 Construction of the final expression plasmid pAR10 was carried out 

in several steps, as follows. The ten PKS DNA units were amplified by PCR 
using pfu DNA polymerase. The respective regions of eryAI gene, as well 
as the oligonucleotides used for each PCR are outlined: 
LM - segment of eryAI gene (Bevitt etai, 1992) extending from nucleotide 

25 (N) 588 to N 2389; 

S'GGCATATGGCGGACCTGTCAAAGCTCTCCGACAGTS' and 
S'GGTCTAGATCCCAGCCGCGGTCGGTCGGCAGTCCCGS', 
KS1 - segment of eryAI gene extending from N 2384 to N 3769; 
5'GGTCTAGACTCGCTGTTCCACCCCGACCCCACGCGCTCGGGCACC 

30 GCGCACCA3' and 
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5'GGTCTAGATCGCGCAGCGCGGCGGACTCGTCGACGGGGGCGAAG 
GCGG3', 

AT1 - segment of eryAI gene extending from N 3764 to N 4813; 
5'GGTCTAGACGGTCTCGCGACGGGAAACGCCGACGGTGCCGCCGTT 
5 GGAA3' 
and 

5'GGTCTAGATCCACCGCGACACCGGCGGCGAACGCGCGGGAGAGC 
GCTTCGC3\ 

KR1 - segment of eryAI gene extending from N 4808 to N 6316; 
10 5'GGTCTAGAGTCGGTGCACCTGGGCACCGGAGCACGCCGGGTGCCC 
TT3' 
and 

5'GGTCTAGATCGTCGAAGAGCCTGGTCGGGCGCTGCGCGGTGTA3', 
ACP1 - segment of eryAI gene extending from N 631 1 to N 6679; 
15 5'GGTCTAGACGACGCGCGGCGGGCTGCGCCGCAGGCGCCGGCCGA 
ACCGCGGG3' 
and 

5'GGTCTAGATCGGCCGTGG-TCGCCGGTGCCGCCTGCTCGGCT3', 
KS2 - segment of eryAI gene extending from N 6674 to N 8200; 
20 5'GGTCTAGACGAGCCGATCGCGATCGTCGGCATGGCGTGC- 
CGGCTGC3' 
and 

5'GGTCTAGATCGTGCACGGCCTCGGCGGTGTCGGCGGCGAGC- 
ACCGCGGCCCGCTCCTC3', 
25 AT2 - segment of eryAI gene extending from N 8195 to N 9340; 
5'GGTCTAGAGGCGGTGGCCGACGGCGCGGTGGTT3' 
and 

5'GGTCTAGATCGTCACGAGGGGTGGTGCGGTCCGGCAGCAGCCAGA 
A3', 

30 KR2 - segment of eryAI gene extending from N 9335 to N 10639; 
5'GGTCTAGACGGCTGGTTCTACC-GGGTCGACTGGACCGAG3' 
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and 

5'GGTCTAGATCCGGCCGGGGCCGGGCGGCGG-TGTAGGACT3', 
ACP2 - segment of eryAI gene extending from N 10634 to N 10966; 
5'GGTCTAGACCGCATCGTCACGACCGCGCCGAGCGA3' 
5 and 

5'GGTCTAGATCG-GCGTCGAGGAAA3', 

TE - segment of eryAIII gene (Donadio et ai 1991) extending from N 8753 
to N 9602; 5'GGTCTAGACAGCGGGACTCCCGCCCGGGAAGCG3' 
and 

10 5'GGGCTAGCTCTAGATCATGAATTCCCTCCGCCCAGCCAGGCGTC3'. 

All PCR products were 5' phosphorylated and ligated to Smal-cut, 
dephosphorylated pUC1 8 vector and used to transform E. coli DH10B 
electrocompetent cells. The desired plasmids - containing the amplified 
DNA fragments were isolated and sequenced using standard pUC forward 

15 and reverse primers. No mistakes in the amplified products were detected. 
All ten plasmids were then used to transform the E.co//ET12567 dam" 
strain. Isolated DNA was digested with Xba\ restriction enzyme and desired 
fragments isolated and purified. The TE unit was then ligated to Xbal-cut 
pCJR24 vector and the ligation products used to transform E. coli DH10B 

20 electrocompetent cells. Plasmid pAR1 was isolated, digested with Xba\, 
and ligated to the ACP2 fragment, and ligation products treated as 
mentioned above. The other DNA fragments, namely, KR2, AT2, KS2, 
ACP1, KR1, AT1 and KS1 were sequentially added to finally yield plasmid 
pAR10. This plasmid was then digested with Ndel and Xbal restriction 

25 enzymes and ligated with the LM fragment previously digested with the 
same two enzymes. The ligated products were used to transform E. coli 
DH10B electrocompetent cells and the final expression plasmid pAR10 
isolated. Plasmid pAR10 was then used to transform S. erythraea/JC2 
strain and colonies carrying the expression plasmid were selected through 

30 resistance to thiostrepton upon integration of the plasmid into the S. 

erythraea chromosome. Single transformants were picked and grown on 
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tap-water medium plates supplemented with thiostrepton, following which 
single transformants were grown in 5X200ml of SM3 liquid media 
supplemented with 5 ug/ml of thiostrepton for seven days (Rowe et a/., 
1998). Cells were removed by centrifugation, the supernatant was 
5 saturated with NaCI and extracted three times with equal volumes of ethyl 
acetate at pH 4.0. The solvent was evaporated to yield 1.12 g of crude 
product. A sample of this crude product was analysed by GC-MS. Two 
peaks were observed, corresponding to molecular masses 158 and 172, 
indicating the presence of the expected acetate- and propionate- derived 

10 polyketides (2R,3S,4S,5R)-2,4-dimethyl-3,5-dihydroxy-n-pentanoic acid 6- 
lactone (1) and (2R,3S J 4S,5R)-2 l 4-dimethyl-3,5-dihydroxy-n-hexanoic acid 
5-lactone (2). Compounds (1) and (2) were found to be structurally identical 
to those reported previously (Cortes et a/., 1995). 
Characterisation of(2R,3S,4S,5R)-2,4-dimethyl-3,5-dihydroxy-n-pentanoic 

1 5 acid 5-lactone (1) 

1 H NMR (CDCI 3 , 500 MHz) 5 H 4.45-4.35 (1 H, dq, J = 6.56 and 1 .62 Hz, C 5 - 
H), 3.8 (1H, dd, J = 10.15 and 4.17 Hz C 3 -H), 2.45-2.70 (1H, br, O-H), 2.42 
(1H, dq, J = 10.0 and 6.97 Hz C 2 -H) f 2.05 (1H, m, C 4 -H), 1.37 (3H, d, J = 
7.17 Hz, C2-CH3), 1.32 (3H, d, J = 6.74 Hz, C5-CH3), 0.95 (3H, d, J = 7.20 

20 Hz, C4-CH3) ppm. 13 C NMR (CDCl 3 , 250 MHz) 6 174.20, 76.15, 73.62, 
39.42, 38.14, 18.11, 14.24,4.48. 

Characterisation of (2R,3SAS t 5Ry2A-dimethyl-3,5-dihydroxy-n-hexanoic 
acid 8-lactone (2) 

1 H NMR (CDCI3, 500 MHz) dH 4.13 (1 H, ddd, J = 8.12, 5.93 and 2.19 Hz, 
25 C 5 -H), 3.82 (1H, m, C 3 -H), 2.42-2.50 (1H, dq, J = 10.17 and 7.08 Hz, C 2 -H), 
2.12-2.19 (1H, m, C 4 -H), 1.77-1.86 (1H, m, one of C 6 -H 2 ), 1.52-1.61 (1H, m, 
one of C 6 -H 2 ), 1 .4 (3H, d, J = 7.09 Hz, C 2 -CH 3 ), 1 .0 (3H, t, J = 7.42 Hz, C 6 - 
CH 3 ), 0.97 (3H, d, J = 6.96 Hz, C4-CH3) ppm. 13 C NMR (CDCI 3 , 250 MHz) d 
173.56, 81.34, 73.96, 40.08, 36.76, 25.27, 14.27, 9.88, 4.37. 

30 

Example 2: in vitro assembly of DNA units 
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Figure 7 outlines the strategy for the in vitro assembly of PKS DNA 
units. The inventors have constructed the multienzyme DEBS1-TE. The in 
vivo construction of the gene for DEBS1-TE, it should be noted, took 12 
days to complete. The in vitro assembly on the other hand was completed 
5 in 2 days. 

All ten domains, namely, LM, KS1, KR1, AT1, ACP1, KS2, AT2, 
KR2, ACP2 and TE were amplified by means of PCR. The forward primer 
in all cases, except the LM contained the Spel recognition sequence 
5'ACTAGT3' while the reverse primer was engineered in such a way that it 

10 contained the Xba\ recognition sequence 5' TCTAGA3' and Smal 

recognition sequence S'CCCGGGS' downstream of the Xbal site (Figure 7). 
The amplification of the LM was carried out using a biotinylated forward 
primer and a reverse primer that contained the Xba\ recognition sequence 
(5TCTAGA3'). All the PCR products were cloned in pUC-18 vector and the 

15 resulting plasmids sequenced to detect possible errors introduced by PCR. 
All plasmids, except the one containing the LM unit were then digested with 
Spel and Smal, dephosphorylated in order to remove the 5' phosphate 
group and the appropriate fragments isolated and eluted. The LM unit was 
cleaved with Xba\ and attached to a bead that was coated with streptavidin 

20 (following the manufacturer's instructions) as shown in figure 7. 

The assembly process was initiated by adding DNA ligase to the 
tube containing a large excess of the first unit (KS1) and LM-bead. The 
reason for having a large excess of the KS1 unit compared to the LM-bead 
unit is to favour the LM-bead ligating to the incoming unit, as opposed to 

25 the self-Iigation of the LM-bead (see figure 7). The ligation of the two DNA 
fragments is unidirectional as only the Spel-cut end of KS1 complements 
the Xbal-cut end of the LM-bead. After the ligation was complete, the 
desired product of the ligation reaction, namely 'bead-LM-KSV was 
separated from the reaction mixture and washed. This product was then 

30 cleaved with Xba\, in order to activate the 3' end of KS1 . The beads were 
washed again to remove the small Xba\-Sma\ DNA fragment that was 
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released from the 3' end of KS1 as a result of RE cleavage. The 'activated' 
bead-LM-KS1 unit was then ligated with Spel, Smal-cut and 5' 
dephosphorylated AT1. The Spel-cut 5' end of AT1 complemented the 
Xbal-cut 3' end of KS1 to give bead-LM-KSl-AT1 as shown in figure 8. 

5 This product was separated from the reaction mixture and washed as 
before. The 3' end of AT1 in this product was then 'activated' through 
cleavage by Xba\, and the assembly process continued. 

Finally, Spel, Smal-cut and 5' dephosphorylated TE unit was ligated 
with the DNA fragment that was now bead-LM-KS1-AT1-KR1-ACP1-KS2- 

10 AT2-KR2-ACP2 as shown in figure 9. The 3' end of the latter fragment was 
'activated' by digesting it with Xba\. The assembled DEBS1-TE gene was 
then inserted in the expression plasmid pCJR24 and the resulting plasmid 
used to transform a streptomyces strain. The expected triketide lactone 
products were isolated and structurally characterised. 

is Use of the in vitro technology described above drastically reduces 

the time it takes to assemble predetermined or randomly shuffled genes. 
Also, the possibility of continuing with the assembly process while having 
numerous different assembly arrays attached to the beads, and splitting 
and mixing the beads between each unit/module addition from a library of 

20 units/modules, results finally in the generation of a cascade of different 
assemblies (Figure 10). These assembled genes can then be cloned 
simultaneously and expressed in a suitable host. An assay system can 
then be used to identify those assembled genes that yield bio-active 
compounds. 

25 

Example 3: Retrobiosynthetic synthesis of a target molecule 

A strategy employing the invention in order to construct the highly 
potent anti-breast cancer drug discodermolide, the anticholesterol 
compound decarestrictine, and the antitumour compound octalacin using 
30 polyketide synthase domains/modules is outlined below. 
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Discodermolide 

The drug discodermolide (Figure 18), isolated from the marine sponge 
' Discodermia disoluta\ has been identified as a highly potent anti-cancer 
compound and 80 times more effective than the well known anticancer 

5 drug Taxol (TerHarr et aL, 1996). It has the same mechanism of action as 
Taxol, even though it is structurally different from the latter. 

One can infer from its structure (Figure 18) that discodermolide is a 
polyketide and can therefore be constructed from a system that has the 
basic enzymatic building blocks (domains and modules) that make other 

10 polyketides like erythromycin and rapamycin. Having predicted that 
approximately 45 domains housed in 12 modules would be required in 
order to carry out the chemistry that accounts for the functionalities on the 
carbon skeleton of discodermolide, one can now begin to construct such a 
system. All one has to do is to identify the type and nature of the 

15 domains/modules that one requires to generate the observed 
functionalities, and then assemble these units in the desired order (Figure 
18). The resulting DNA assembly can then be put into a bacterial strain that 
makes a functional polyketide synthase. 

Until now, it would have been exceedingly difficult, if not impossible 

20 to assemble 45 or so pieces of DNA in the wanted order, for several 
reasons. Firstly, one would have to look for two different restriction 
enzymes every time one needed to assemble two DNA segments. This is 
because if one uses just one restriction enzyme at either end of the 
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domain, the already-assembled piece/pieces of DNA would be cleaved 
from the assembly every time one decided to insert a new domain. 
Secondly, in GC-rich DNA like the polyketide synthase producing 
Streptomyces strain, unique restriction enzyme sites are few and far 

5 between. To a molecular biologist, the task of assembling 40 pieces of 
DNA with the limitations mentioned above, would seem an insurmountable 
one. One would rather attempt to isolate the genes that make the drug at 
the first place than consider carrying out "step-by-step" reconstruction of 
the gene itself. In the case of discodermolide, even the last possibility is in 

10 the realms of fantasy. The organism within the marine sponge that makes 
the drug has not been identified. The only way discodermolide can be 
made available is through chemical synthesis - there have been a few 
chemical routes reported in literature recently (Marshall and Johns, 1998 
and references therein). However, as is the case with most other complex 

15 molecules, large scale production of discodermolide, using the chemical 
route would turn out to be outrageously expensive. Chemists have been 
using the retrosynthetic analysis approach towards total synthesis of 
important bioactive molecules. This approach breaks the target compound 
into many smaller pieces - easily synthesised - which are then re- 

20 assembled. 

The type of polyketide or other synthetic enzyme domains required 
in order to construct the target molecule from the starting units are 
identified using a "retrobiosynthetic analysis" approach for discodermolide, 
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by matching which molecules need to be condensed to form the 
macromolecule with the enzyme domains that carry out the required 
catalysis to build the macromolecule. 

Having identified the enzyme units that are required, the unit-DNA 
5 segments are amplified using the polymerase-chain-reaction (PCR) - from 
the library of existing polyketide synthase unit-DNA, and the appropriate 
recognition sequences are attached to each unit-DNA fragment. All of the 
unit fragments are then replicated in a dam" strain whereby both the 
unmodified and modified sequences (5TCTAGA3' and 5'GATCTAGA3' 
io respectively) are cleaved by the restriction enzyme Xba\. 

Having constructed this library of appropriate PKS or other synthetic 
enzyme units, the corresponding DNA units are then assembled. The 
assembled DNA piece is then placed in a vector, so that it can be inserted 
in a bacterial strain to yield the desired synthetic protein. Suitable vectors 
15 have an antibiotic resistance marker (for selection of this vector on an 
antibiotic-rich media) and an "origin-of -replication" (ori). Ori is essential for 
the independent growth of the vector in any strain. Particularly suitable 
vectors for the expression of the synthetic enzymes of the invention are the 
actinomycete vectors described by Rowe et al. (1998). 
20 The strain is then grown in a media that is supplemented with the 

antibiotic, the resistance gene for which is present in the vector. 

Figures 4 and 5 show how the assembly proceeds. The first domain 
is inserted into a vector that is cut by cleavage with Xbal After the ligation 



WO 00/77181 



-34- 



PCT/GB00/02286 



of the domain has taken place with the vector, the DNA is put in a bacterial 
. strain that is dam + and grown. Finally, bacterial colonies that have the 

desired vector-domain DNA are identified and DNA isolated from them. The 

whole procedure is cheap and fast. Only one restriction enzyme (Xba\) is 
5 made use of, routine cloning technology is employed, the desired DNA 

fragment is obtained, which can then be expressed in a Streptomyces 

strain to yield the polyketide synthase. 

The in vivo "domain-by domain" construction of the discodermolide 

producing polyketide synthase would take approximately 55 days via this 
io method. In comparison, assembly of modules would take less time, as one 

would need to assemble fewer pieces. Most importantly, once the synthase 

is shown to be functionally active, a large fermentation of the bacterial 

strain can be carried out, and the drug isolated in however much quantity 

one requires - unlike the chemical route where the starting materials have 
15 to be freshly synthesised every time one requires the target compound. 

Employing such a strategy would lead to a quick and inexpensive synthesis 

of important bioactive molecules like discodermolide. 

Retrobiosynthetic analysis 

The whole approach (retrobiosynthetic analysis followed by 
20 identification of PKS units, followed by assembly of PKS units) is made 

clearer in the following two examples. 
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Octalactin 

A new addition to the rare class of eight-membered lactone natural 
products is the family of Octalactin. Octalactin A and B (Figure 20) are 
natural products isolated from the marine gorgonian octocoral ' Pacifigorgia 

5 sp.' (Tapiolas et. aL, 1991). Octalactin A shows very strong cytotoxicity 
toward B-16-F-10 murine melanoma and HCT-116 human colon tumour 
cell lines and is a promising drug candidate, while octalactine B displayed 
no such activity (Tapiolas et. al., 1991). Total syntheses of both octalactin A 
and B have been reported in literature. One such synthesis (Buszek, et. al. t 

10 1994) typically involves more than 12 chemical steps in leading to the 
target molecules. Clearly, large-scale production of octalactins using 
chemical synthesis is industrially not viable. On the other hand, the genes 
that code for the enzymes that make octalactins have not be identified or 
isolated. This means that at present, modified octalactins can only be made 

15 using chemical synthesis. A gene is constructed from the available PKS 
spare parts - that would code for the enzymes that would make octalactin 
B. Octalactin B can then be converted into the cytotoxic octalactin A by 
one-step stereospecific epoxidation. Also, once the gene for octalactin B is 
constructed and shown to make the octalactin PKS, genetic engineering on 

20 this gene would yield modified octalactin PKSs that in turn would 
synthesise octalactin analogues. 

Clearly, a polyketide, the carbon skeleton of octalactin B (Figure 19) 
can be seen to be assembled by acetate and propionate units. The uptake 
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and assembly of these units in the prescribed sequence, as well as the 
functionalities that decorate the carbon chain of octalactin can be assigned 
to various PKS modules (see figure 19). Once a decision has been made 
regarding the type and nature of PKS modules, they can be strung together 

5 to make a gene using the invention. This gene can then be expressed in a 
suitable host in order to look for octalactin B production. The 
retrobiosynthetic approach towards octalactin is shown in detail in figure 
19. A choice of what modules to select from the PKS module library is 
followed by amplification of the modular DNA fragments using the 

10 oligonucleotides such that the 5' and the 3' ends of every DNA fragment 
have the restriction enzyme recognition sites stated under the description 
of the invention. The choice of modules that, when assembled, would make 
the 'octalactin gene' is displayed as a schematic representation in figure 
20. 

15 Decarestrictine J 

The molecule decarestrictine J can be synthesised using the 
retrebiosynthetic approach. Decarestrictine J is a ten-membered lactone 
that comes from the family of decarestrictines, shown to display strong anti- 
cholesterol activity (Grabley et. al., 1992). The total synthesis of 

20 Decarestrictine J has been reported and involves numerous chemical steps 
(Yamada et. al., 1995). The target molecule (figure 21) can be conceived to 
be formed by assembly of five acetate polyketide units. Using the 
retrobiosynthetic approach, one can identify the PKS domains/modules that 
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would be required for the carbon skeleton of decarestrictine J. A 
hypothetical decarestrcitine PKS is shown in figure 21 . The loading module, 
as well as the four internal modules along with the TE domains can be 
conveniently assembled using the invention. The assembled 
5 'decarestrictine gene* can then be expressed in a suitable host in order to 
check for the production of decarestrictine J. 

In summary, the retrobiosynthetic approach involves the following 

steps; 

a) . Identification of the number and nature of carbon units that make up the 
10 target molecule 

b) . Identification of the modules/domains from libraries of 
polyketide/peptide synthetase/fatty acid/etc. encoding units that are 
responsible for the uptake of the said carbon units and the nature and 
degree of functionalisation of the carbon chain 

15 c). Assembly of the said modules/domains using the methods of the 
invention 

d). Expression of the assembled gene in a suitable expression host. 

Example 4: Transforming strains with DNA encoding similar synthetic 
20 enzyme domains 

A method for transforming expression strains with DNA encoding similar 
synthetic enzyme domains has been devised. Instead of using the TE PKS 
DNA fragment as a region of integration from the assembled gene into a 
25 streptomyces host (S. erythraea/JC2, Rowe et ai % 1998), a mutated recA 
gene fragment from streptomyces is used. The assembly process is carried 
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out in a recA E. coli strain (e.g. DH10B) as previously described. As this 
* strain is recA", one can assemble any number of identical DNA units. The 

vector, into which the assembled gene is being constructed, contains a 
portion of a streptomyces recA gene. This recA fragment carries a 
5 mutation. After the synthetic enzyme gene has been assembled, the vector 
is used to transform a streptomyces host (e.g. S. lividans or S. erythraea). 
The fragment of recA gene carrying a mutation recombines with the recA 
gene of the streptomyces host, abolishing the functional recA gene and 
making the strain recombination minus (Figure 11). This means that an 
10 event, such as the one described in figure 2 is now not possible. The strain 
is then grown to look for the encoded enzyme product. This strategy is 
tested by assembling a functional PKS gene having more than one type of 
identical DNA units (Figure 12). 

Construction of the PKS multienzyme recDEBS1-TE 

15 RecA protein has been characterised as a multifunctional enzyme that is 
essential for homologous recombination, DNA repair, SOS response and 
DNA rearrangements (Miller and Kokjohn, 1990). Most of the routinely used 
strains of E. coli are recA. The gene for recA has been identified from 
many streptomyces strains. The first streptomyces recA gene to be 

20 characterised and isolated was from S. lividans (NuGbaumer and 
Wohlleben, 1994) RecA mutants have since been generated in S. 
ambofaciens (Aigle et al., 1997). The streptomyces recA protein has 
approximately 372 amino acid residues (Figure 13). DNA sequence 
analysis suggests a coding region of 1 122 bp, and is found to be highly 

25 conserved within streptomyces (Figure 14). In fact the recA mutants of S. 
ambofaciens were generated by integrating a mutated portion of the S. 
lividans recA gene into the S. ambofaciens host. It was found that a recA 
mutant lacking 30 aa from the C-terminus of the protein inhibited 
recombination events in S. ambofaciens (Aigle ef a/., 1997). 

30 A recA mutant of the streptomyces host that is used for expression 

of the assembled gene was generated. 
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The oligonucleotides: 

5'- GGTCTyAG/AATTCGGCAAGGGCGCCGGTCATGCGCAT-S' and 
5'- GG TCTAGA TCTGCGGCGTCGGCCGGGGCGGCGGAGGCG-3' 
were used as the forward and reverse primers respectively and the 1000 
5 bp internal region of S. lividans recA gene (NuBbaumer and Wohlleben, 
1994) was amplified using pfu polymerase. An additional nucleotide (C) 
was incorporated into the forward primer to generate a frame shift in the 
amplified recA gene fragment. The PCR product was cloned- in pUC-18 
vector and sequenced to detect for possible errors during PCR. The 1.0 

10 kbp recA fragment, flanked at both ends by an Xba\ site was then inserted 
in the expression vector pCJR24 that has a unique Xba\ site. The ligation 
mixture was used to transform E. coli DH10B cells and the desired plasmid 
DNA isolated. The resulting plasmid (pARec>424) contains a non- 
methylated Xba\ site at the 5' end of the recA gene fragment. The ten PKS 

15 DNA units, namely, TE, two each of ACP1, KR1, AT1 & KS1, and LM were 
inserted into the plasmid pARecA24 to finally yield the expression plasmid 
pRecADUE. This plasmid was used to transform wild-type S. lividans 
protoplasts, and thiostrepton resistant colonies were grown in defined liquid 
media as described above. The compound (Figure 12) was isolated from 

20 the bacterial broth and chemically characterised. 

Thus, it has been shown that a gene carrying interspaced DNA units 
that are identical in structure as well as function does not lead to internal 
recombination events, as the native recA gene of the streptomyces host 
has been disrupted. Furthermore, it has been shown that it is possible to 

25 use identical domains to reach the objective of generating hybrid synthetic 
enzyme systems. This strategy will greatly reduce the number of domains 
that otherwise have to be employed for the purposes of de novo PKS gene 
assembly that yields the desired chemical compounds. The inventors have 
established a set of 12 domains that are capable of functioning robustly 

30 and are independent of flexibility and spacial constraints - problems that 
beset the choice of domains and modules previously. 
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CLAIMS 

5 1 . A method of assembling several DNA units in sequence in a 

DNA construct, which method comprises the steps of 

a) providing each DNA unit with a restriction enzyme -recognition 
sequence at it's 5' end and with a recognition sequence for the same 

30 restriction enzyme at its 3' end that is combined with a recognition site for a 
DNA modification enzyme. 

b) providing a starting DNA construct having an accessible 
restriction site for the same or a compatible restriction enzyme and cleaving 

15 the starting DNA construct with such a restriction enzyme, 

c) inserting the desired DNA unit and bringing the ligated 
product into contact with a DNA modification enzyme such that the 
restriction site at the 3' end of the inserted DNA unit is abolished 

20 

d) cleaving the ligated product at an accessible unmodified 
recognition site for the same or a compatible restriction enzyme, 

e) repeating steps c) and d) to introduce each desired DNA unit 
25 to give a DNA construct containing all the desired units in sequence. 

2. The method of claim 1 wherein the DNA modification enzyme 

is a methylase. 



30 



3. The method of claim 2 wherein the methylase is the dam 

methylase of Escherichia coli 
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4. The method of claim 3 which comprises the steps of 

a) providing each DNA unit with an Xba\ recognition sequence 
5 5'XXTCTAGA3' (where XX is not GA) at it's 5' end and with an Xba\ 

recognition sequence 5'GATCTAGA3' at its 3' end. 

b) providing a starting DNA construct having an accessible Xba\ 
site and cleaving the starting DNA construct with Xbal, 

10 

c) inserting the desired DNA unit and using a resulting ligated 
product to transform a dam+ strain of E. colt\ 

d) recovering a resulting plasmid and cleaving the plasmid at an 
15 accessible Xbal site with Xbal, 

e) repeating steps c) and d) to introduce each desired DNA unit 
to give a DNA construct containing all the desired units in sequence. 

20 5. The method of any one of claims 1 to 4, wherein the 

recognition sequences for the restriction enzyme and the DNA modification 
enzyme are created in the DNA units prior to cutting with the restriction 
enzyme. 

25 6. The method of claim 5 wherein the restriction sites are 

created in the fragment by means of a primer extension reaction. 

7. The method of any one of claims 1 to 6, wherein the DNA 

construct is an expression vector capable of facilitating expression of the 
30 protein encoded by the desired DNA units 
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8. The method of claim 3 or claim 4, wherein the DNA 

modification is removed and the restriction site re-established by replicating 
the ligated product in a dam- strain of £ coli by means of a suitable vector. 

5 9. A method of making an assembly of several DNA units in 

sequence which method comprises the steps of: 

a) providing a first DNA unit with a recognition sequence for a 
first restriction enzyme at its 3* end, and cleaving the said first DNA unit 

10 with said first restriction enzyme, 

b) providing each other DNA unit with a recognition sequence at 
its 5' end for a second restriction enzyme which has a compatible ligation 
sequence with that of the first restriction enzyme, and a downstream 

15 recognition sequence for said first restriction enzyme followed by a 

downstream recognition sequence for a third restriction enzyme at its 3' 
end, and cleaving each said other DNA unit with the second and third 
restriction enzymes, 

20 c) ligating the said first DNA unit with a desired other DNA unit 

to form a ligated product such that the ligation of the two units abolishes the 
recognition site for the first restriction enzyme at the ligation junction, and 
cleaving the ligated product with said first restriction enzyme, 

25 d) ligating the product from c) with a desired DNA unit from b) to 

form a ligated product and cleaving the ligated product with said first 
restriction enzyme 

e) repeating step d) with each other DNA unit in turn so as to 

30 assemble the DNA units in sequence. 
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1 0. The method of claim 9 which method comprises the steps of: 

a) providing a first DNA unit with an Xba\ recognition sequence 
5TCTAGA3' at its 3' end, and cleaving the said first DNA unit with Xbal, 

5 

b) providing each other DNA unit with a Spel recognition 
sequence 5'ACTAGT3' at its 5' end, and a downstream Xba\ recognition 
sequence 5TCTAGA3' followed by a downstream Smal recognition 
sequence 5'CCCGGG3' at its 3' end, cleaving each said other DNA unit 

io with Spel and Sma\, and dephosphorylating the 5' end of the cleaved DNA 
unit, 

c) ligating the said first DNA unit with a desired other DNA unit 
to form a ligated product and cleaving the ligated product with Xba\, 

15 

d) ligating the product from c) with a desired DNA unit from b) to 
form a ligated product and cleaving the ligated product with Xba\ 

e) repeating step d) with each other DNA unit in turn so as to 
20 assemble the DNA units in sequence. 

1 1 . The method of claim 9 or claim 1 0 wherein the assembly 
occurs via stepwise addition of fragments to a vector 

25 12. The method of claim 9 or claim 10 wherein the said first DNA 

unit is attached to the solid phase for use in step c) 

13. The method of claim 12, wherein the solid phase is split and 

mixed between steps c), d), and e) to make several different assemblies. 



30 



WO 00/77181 



PCT/GB00/02286 



-49- 

14. The method of any one of claims 9-13, wherein the 

recognition sequences in one or more of the DNA units are introduced by 
means of extension primers. 

5 15. The method of any one of claims 9-14 wherein the assembly 

of several DNA units is inserted in to an expression vector which is used to 
transform a host capable of expressing the protein encoded by the vector 

1 6. The method of any one of claims 1 -1 5, wherein one or more 
jo of the DNA units encodes a catalytic or transport protein domain, (see 

Kleinkauf peptide/polyketide systems paper) 

17. The method of claim 16 wherein one or more of the DNA 
units are derived from polyketide synthesising enzyme domain DNA 

15 sequences. 

1 8. The method of claim 1 6 wherein one or more of the DNA 
units are derived from peptide synthesising enzyme domain DNA 
sequences. 

20 

1 9. The method of claim 1 6 wherein one or more of the DNA 
units are derived from hybrid peptide polyketide enzyme domain DNA 
sequences. 

25 20. . The method of claim 16 wherein one or more of the DNA 

units are derived from fatty acid synthesising enzyme domain DNA 
sequences 

21 . The method of claim 1 6 wherein one or more of the DNA 

30 units encode modules comprising one or more catalytic or transport 
domains 
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22. DNA constructs incorporating one or more DNA assemblies 
encoding synthetic enzymes made by any one of the methods of 
claims 1-21. 

5 

23. Synthetic enzymes encoded by one or more DNA assemblies 
made by the methods of anyone of claims 1-21 

24. Hosts expressing DNA constructs encoding one or more 
io synthetic enzymes made by any one of the methods of claims 1 -21 . 

25. Hybrids of transformed hosts expressing one or more DNA 
constructs encoding synthetic enzymes incorporating a DNA assembly 
made by any one of the methods of claims 1-21 . 

15 

26. Compounds produced by synthetic enzymes encoded by 
DNA assemblies made by any one of the methods of claims 1-21 . 

27. A method of synthesising a target molecule comprising the 
20 steps of 

a) examining the composition and stereochemistry of a target 

molecule, 

25 b) determining which catalytic and transport domains need to be 

present in a synthetic enzyme in order to catalyse the synthesis of the 
target molecule, 

c) using any one of the methods of claims 1-21 to assemble the 

30 required DNA units encoding the catalytic and transport domains into a 
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DNA assembly that encodes said synthetic enzyme which is capable of 
synthesising the target molecule, 

d) placing the DNA assembly into a vector to allow expression of 

5 the synthetic enzyme in a host capable of synthesising the target molecule 
after transformation with said vector. 

28. The method of claim 27 wherein the transformed host is 
tested for the presence of the target molecule after step d). 

10 

29. The transformed host of claim 27, 

30. Use of transformed host of claim 27 to produce said target 
molecule. 

15 

31 . A method of making a synthetic enzyme to catalyse the 
synthesis of a target molecule comprising the steps of 

a) examining the composition and stereochemistry of a target 
20 molecule, 

b) determining which catalytic and transport domains need to be 
present in the synthetic enzyme in order to catalyse the synthesis of the 
target molecule, 

25 

c) using any one of the methods of claims 1-21 to assemble the 
required DNA units encoding the catalytic and transport domains into a 
DNA assembly that encodes an enzyme which is capable of synthesising 
the target molecule. 

30 
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d) expressing the DNA assembly in a suitable host to produce 

the enzyme. 

32. A library of DNA units encoding catalytic or transport protein 

5 domains, wherein each DNA unit has a recognition sequence for a 

restriction enzyme at it's 5'-end and a second recognition sequence for the 
same or a compatible enzyme at it's 3-end which incorporates a 
recognition sequence for a DNA modifying enzyme. 

io 33. The library of claim 32, wherein each DNA unit has an Xba\ 

recognition sequence 5'XXTCTAGA3' (where XX is not GA) at it's 5'-end 
and an Xba\ recognition sequence 5'GATCTAGA3' at it's 3-end 

34. A library of DNA units encoding catalytic or transport protein 
is domains, wherein each DNA unit has a recognition sequence at its 5' end 

for a first restriction enzyme, and a downstream recognition sequence for a 
second restriction enzyme followed by a downstream recognition 
sequence for a third restriction enzyme at its 3' end, such that the DNA 
units, once restricted by the first and second restriction enzymes can be 
20 ligated together to abolish the restriction sites at the ligation junction. 

35. The library of claim 34, wherein each DNA unit has a Spel 
recognition sequence 5'ACTAGT3' at its 5'-end, and a downstream Xba\ 
recognition sequence 5TCTAGA3' followed by a downstream Sma\ 

25 recognition sequence 5'CCCGGG3' at it's 3'-end 

34. The library of claim 32 or claim 34, wherein the DNA units 

encode polyketide synthetic domains, comprising two KS domains, at least 
two AT domains, two KR domains, two DH domains, two ER domains, an 
30 ACP domain and a TE domain. 
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35. A module comprising a DNA sequence encoding a functional 
set of polyketide synthetic domains wherein the module has a recognition 
sequence for a restriction enzyme at it's 5'-end and a second recognition 
sequence for the same or a compatible enzyme at it's 3'-end which 

5 incorporates a recognition sequence for a DNA modifying enzyme 

36. The module as claimed in claim 35, wherein the module has 
an Xba\ recognition sequence S'XXTCTAGAS' (where XX is not GA) at it's 
5'-end and an Xba\ recognition sequence 5'GATCTAGA3' at it's 3'-end 

10 

37. A module comprising a DNA sequence encoding a functional 
set of polyketide synthetic domains wherein the module has a recognition 
sequence at its 5' end for a first restriction enzyme, and a downstream 
recognition sequence for a second restriction enzyme followed a 

15 downstream recognition sequence for a third restriction enzyme at its 3' 
end, such that the DNA units, once restricted by the first and second 
restriction enzymes can be ligated together to abolish the restriction sites at 
the ligation junction 

20 38. The module as claimed in claim 37, wherein the module has a 

Spel recognition sequence 5'ACTAGT3' at its 5'-end, and a downstream 
Xbal recognition sequence 5TCTAGA3' followed by a downstream Sma\ 
recognition sequence 5'CCCGGG3' at it's 3-end 

25 39. A module as claimed in claim 35 or claim 37, wherein the 

DNA units encode polyketide synthetic domains, comprising two KS 
domains, at least two AT domains, two KR domains, two DH domains, two 
ER domains, an ACP domain and a TE domain 

30 40. A vector containing one or more modules as claimed in claim 

35 or claim 37. 



WO 00/77181 



PCT/GB00/02286 



-54- 



41 . The vector as claimed in claim 40, wherein a non-functional 

recA gene is also present. 

5 42. A method of transforming a host with one or more synthetic 

DNA assemblies encoding enzyme domains which comprises the steps of: 

a) Inserting said DNA assembly into a vector containing a 
mutated internal fragment of a recA gene sequence such that the vector is 

io capable of undergoing homologous recombination with the recA gene of 
the host, 

b) bringing said vector into contact with a host chromosome 
under conditions which permit homologous recombination to take place, 

15 

c) disrupting the host recA gene by the integration of the DNA of 
said vector into the chromosome. 

43. The method of claim 42 wherein the expression vector is 
20 used to transform a Steptomyces host. 

44. The method of claim 42 or claim 43, wherein the DNA 
assemblies are modules according to claim 35 or claim 37. 

25 45. A host lacking a recA function, transformed with a vector 

containing one or more modules according to claim 35 or 37. 



30 



46. A kit containing DNA units, DNA modules, vectors, DNA 

manipulation hosts, DNA modification hosts, expression hosts, or solid 
phase elements for use in the methods claimed herein. 
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§ J 1 MAGTDREKALDAALAQIERQFGKGAVMRMGDRTNEPIEVIPTGSTALDVA 50 

I I I I I I I II I I I I I II I I I I I I I I I I I II I I I - I I I I I I I I I I I I i I I I 
S.a. i MAGTDREKALDAALAQIERQFGKGAVMRMGDRSKEPIEVIPTGSTALDVA 50 

51 LGVGGIPRGRWEVYGPESSGKTTLTLHAVANAQKAGGQVAFVDAEHALD 100 

I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I II 
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M 1 1 ! MM Ml! I II MIIIMIIMIIII 1 1 II I! IIIIMMIIII I 
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I II Mill IMMI IMMMIMIIMMI II II II MM llll II Ml 
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C J 1 ATGGCAGGAACCGACCGCGAGAAGGCCCTGGACGCCGCGCTCGCACAGAT 50 

c ' llllllllllllllllllilllllll II llllllll lllllllllll 

5. a. 1 ATGGCAGGAACCGACCGCGAGAAGGCTCTTGACGCCGCACTCGCACAGAT 50 



51 TGAACGGCAATTCGGCAAGGGCGCGGTCATGCGCATGGGTGACCGGACCA 100 

inn mi iii I nun mm mm ii tun mm i i 

51 TGAACGGCAGTTCGGCAAGGGCGCGGTCATGCGCATGGGCGACCGGTCGA 100 



101 ACGAGCCCATCGAGGTCATCCCGACCGGGTCTACCGCGCTCGACGTGGCC 150 

i imiiiiiiiimiiimiiiiiiii iiiiiiimimmi 

101 AGGAGCCCATCGAGGTCATCCCGACCGGGTCGACCGCGCTCGACGTGGCC 150 



151 CTCGGCGTCGGAGGCATCCCGCGTGGCCGTGTCGTGGAGGTCTACGGCCC 200 

iiiiiiiiiii iii i inn mil m t iiimmii it 

151 CTCGGCGTCGGCGGCCTGCCGCGCGGCCGCGTCATCGAGGTCTACGGTCC 200 



201 CGAGTCCTCGGGCAAGACGACCCTGACCCTGCACGCGGTGGCGAACGCGC 250 

mimi ii iiimiimmiimmi mmiiimi 

201 GGAGTCCTCCGGTAAGACGACCCTGACCCTGCACGCCGTGGCGAACGCGC 250 



251 AGAAGGCCGGCGGCCAGGTCGCGTTCGTGGACGCCGAGCACGCCCTCGAC 300 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 immmiiiii 

251 AGAAGGCCGGCGGCCAGGTGGCGTTCGTGGACGCGGAGCACGCCCTCGAC 300 



301 CCCGAGTACGCGAAGAAGCTCGGTGTCGACATCGACAACCTGATCCTGTC 350 

iiiiiiiiiii imimii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

301 CCCG AGTACGCCCAGAAGCTCGGCGTCGACATCGACAACCTGATCCTGTC 350 



351 CCAGCCGGACAACGGTGAGCAGGCCCTGGAGATCGTGGACATGCTGGTCC 400 

I I III 111 III I I I I III I I I I f II I I I II I I I I I I I I I I I I I I I I I I I I 
351 CCAGCCGGACAACGGTGAGCAGGCCCTGGAGATCGTGGACATGCTGGTpC 400 



401 GCTCCGGCGCCCTCGACCTCATCGTCATCGACTCCGTCGCCGCGCTCGTC 450 

) ! 1 1 ! 1 1 U 1 M ! I M M II I M I M 1 1 1 ) I M J M M M M U I M 1 11 

401 GCTCCGGCGCCCTCGACCTCATCGTCATCGACTCCGTCGCCGCGCTCGTC 450 



451 CCGCGCGCGGAGATCGAGGGCGAGATGGGCGACAGCCACGTCGGTCTGCA 500 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M M M I M 1 11 1 1 1 II 

451 CCGCGCGCGGAGATCGAGGGCGAGATGGGTGACAGCCACGTCGGTCTCCA 500 



501 GGCCCGGCTGATGAGCCAGGCCCTGCGGAAGATCACCAGCGCGCTCAACC 550 

mmmiiimmm n mmmiiiiimiimm 

501 GGCCCGGCTGATGAGCCAGGCGCTCCGGAAGATCACCAGCGCGCTCAACC 550 
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551 AGTCCAAGACCACCGCGATCTTCATCAACCAGCTCCGCGAGAAGATCGGC 600 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIII 

551 AGTCCAAGACCACCGCGATCTTCATCAACCAGCTCCGCGAGAAGATCGGC 600 

• • * * * 

601 GTGATGTTCGGCTCCCCGGAGACCACGACCGGTGGCCGGGCACTGAAGTT 650 

|| 1 I f I 1 I t I I ! t I i 1 f I 1 I 1 I 1 I I 1 1 I 1 I 1 I I I t I I 1 t I II Mill 
601 GTCATGTTCGGCTCCCCGGAGACCACGACCGGTGGCCGGGCGCTCAAGTT 650 

651 CTACGCCTCGGTGCGACTCGACATCCGGCGTATCGAGACGCTGAAGGACG 700 

Ml IIIMIIIMII I III I Ml I ill II 1 1 1 M 1 1 1 1 1 1 lllllll 

651 CTACGCCTCGGTGCGACTCGACATCCGACGCATCGAGACGCTCAAGGACG 700 
701 GCACCGACGCGGTCGGCAACCGCACCCGCGTCAAGGTGGTCAAGAACAAG 750 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 lllllllllll llllllllllll 

701 GCACCGACGCGGTCGGCAACCGCACGCGCGTCAAGGTCGTCAAGAACAAG 750 



751 



751 



GTCGCGCCGCCCTTCAAGCAGGCCGAGTTCGACATCCTCTACGGCCAGGG 

1 1 1 1 1 1 1 1 1 M M M 1 1 1 1 1 M 1 1 1 M 1 1 1 M 1 M 1 11 i I i 1 1 1 i ! 1 1 ! f 

GTCGCGCCGCCCTTCAAGCAGGCCGAGTTCGACATCCTCTACGGCCAGGG 



800 



800 



801 CATCAGCCGCGAGGGCGGTCTGATCGACATGGGCGTGGAG AACGGCTTCG 850 

I II lllllllllll! Ill MM II I II III lllllll II lllllll II 

801 CATCAGCCGCGAGGGCGGCCTGATCGACATGGGCGTGGAGCACGGCTTCG 850 

• • • • 

851 TCCGCAAGGCCGGCGCCTGGTACACGTACGAGGGCGACCAGCTCGGTCAG 900 

| I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
851 TCCGCAAGGCCGGCGCCTGGTACACGTACGAGGGCGACCAGCTCX3GCCAG 900 

901 GGCAAGGAGAACGCGCGCAACTTCCTGAAGGACAACCCCGACCTGGCCAA 950 

1 1 1 1 i 1 1 1 1 1 1 1 1 i f f 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 i 1 1 f 1 1 1 1 ! 1 1 1 1 1 Mill 

901 GGCAAGGAGAACGCGCGCAACTTCCTGAAGGACAACCCCGACCTCGCCAA 950 
951 CGAGATCGAGAAGAAGATCAAGCAGAAGCTGGGCGTCGGCGTGCACCCCG 1000 

MMMIIIIIIMMIMMI IIMIIIMMIIIM.il I MM 

951 CGAGATCGAGAAGAAGATCAAGGAGAAGCTGGGCGTCGGAGTCCGTCCCG 1000 



1001 AGGA. . .GTCGGCCACCGAGCCCGGCGCGGACGCCGCCTCCGCCGCCCCG 1047 

MM I IMIMMIII MM 1 1 1 1 MUM II 

1001 AGGAGCCGACGGCCACCGAGTCCGGACCGGA 



.CGCCGCGACG 1041 



1048 GCCGACGCCX5CACCGGCGGTGCCCGCACCCACGACCGCCAAGGCCACCAA 1097 

Mill 1 1 1 1 1 1 1 1 ! 1 1 J I T 1 1 1 1 1 Ml IMIIIIMMI II 1 1 II 

1042 GCCGAATCCGCACCGGCGGTGCCCGCGCCCGCGACCGCCAAGGTCACCAA 1091 
1098 GTGCAAGGCCGCGGCAGCCAAGAGCTGA 1125 

I I M M 1 1 ! I ! I M I i M M M M 1 1 i 

1092 GGCCAAGGCCGCGGCAGCCAAGAGCTGA 1119 
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Box 1 Observati ns where certain iaims wer found unsearchable (Continuati n of item 1 of first sheet) 



This International Search R port has not been established in respect of certain claims und r Article 17(2)(a) for the following reasons: 



2. | J Claims Nos.: 

because they relate to parts of the International Application that do not comply with the prescribed requirements to such 
an extent that no meaningful International Search can be carried out, specifically: 



3. | | Claims Nos.: 

because they are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4(a). 



Box il Observations where unity of invention is lacking (Continuation of item 2 of first sheet) 



This International Searching Authority found multiple inventions in this international application, as follows: 



see additional sheet 



1 • As all required additional search fees were timely paid by the applicant, this International Search Report covers all 
Ui - J searchable claims. 



2. L J As all searchable claims could be searched without effort justifying an additional fee, this Authority did not invite payment 
of any additional fee. 



3. As only some of the required additional search fees were timely paid by the applicant this International Search Report 
1 — 1 covers only those claims for which fees were paid, specifically claims Nos.: 



4. I I No required additional search tees were timely paid by the applicant. Consequently, this International Search Report is 

restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 



Remark on Protest j Th additional search fees w re accompanied by the applicant's protest 



1. 



Claims Nos.: 

because they relate to subject matter not required to be searched by this Authority, namely: 





No protest accompanied the payment of additional search fees. 
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FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 



This International Searching Authority found multiple (groups of) 
inventions in this international application, as follows: 

1. Claims: (l-8,32,33,37,38)-complete, (16-31,36,41-43, 
46-48)-partially 



A method of assembling several DNA units in sequence in a 
DNA construct, which method comprises the step of: a) 
providing each DNA unit with a restriction enzyme 
recognition sequence at its 5' end and with a recognition 
sequence for the same restriction enzyme at its 3' end that 
is combined with a restriction site for a DNA modification 
enzyme, b) providing a starting DNA construct having an 
accessible restriction site for the same or a compatible 
restriction enzyme and cleaving the starting DNA construct 
with a restriction enzyme, c) inserting the desired DNA unit 
and bringing the li gated product into contact with a DNA 
modification enzyme such that the restriction site at the 3' 
end of the inserted DNA unit is abolished, d) cleaving the 
li gated product at an accessible unmodified recognition site 
for the same or a compatible restriction enzyme, e) 
repeating step c) and d) to introduce each desired DNA unit 
to give a DNA construct containing all the desired units in 
sequence; DNA construct incorporating one or more DNA 
assemblies encoding synthetic enzymes and/or hosts 
expressing DNA constructs made by said method; compounds 
produced by synthetic enzymes encoded by said DNA 
assemblies; a method of synthesising a target molecule using 
said method; a method of making a synthetic enzyme to 
catalyse the synthesis of a target molecule using said 
method; a library of DNA units encoding a catalytic or 
transport protein domains, wherein each DNA unit has a 
recognition sequence for a restriction enzyme at its 5'-end 
and a second recognition sequence for the same or a 
compatible enzyme at its 3'-end which incorporates a 
recognition sequence for a DNA modifying enzyme; 
a module comprising a DNA sequence encoding a functional set 
of polyketide synthetic domains wherein the module has a 
recognition sequence for a restriction enzyme at its 5*-end 
and a second recognition sequence for the same or a 
compatible enzyme at its 3'-end which incorporates a 
recognition sequence for a DNA modifying enzyme; 
a method of transforming a host with one or more synthetic 
DNA assemblies encoding enzyme domains, wherein the DNA 
assemblies are said modules; 



2. Claims: (9-15,33,34,39,40)-complete, (16-31,36,41-43, 
46-48 )-part1 ally 



Idem as invention 1, but limited to a method of: assembling 
several DNA units in sequence in a DNA construct, which 
method comprises the step of: a) providing a first DNA unit 
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with a recognition sequence for a first restriction enzyme 
at its 3' end, and cleaving the said first ONA unit with 
said first restriction enzyme, b) providing each other DNA 
unit with a recognition sequence at its 5' end for a second 
restriction enzyme which has a compatible ligation sequence 
with that of the first restriction enzyme, and a downstream 
recognition sequence for said first restriction enzyme 
followed by a downstream recognition sequence for a third 
restriction enzyme at its 3' end, and cleaving each said 
other ONA unit with the second and third restriction 
enzymes, c) ligating the said first DNA unit with a desired 
other DNA unit to form a ligated product such that the 
ligation of the two units abolishes the recognition site for 
the first restriction enzyme at the ligation junction, and 
cleaving the ligated product with said first restriction 
enzymel , d) ligating the product from c) with a desired DNA 
unit from b) to form a ligated product and cleaving the 
ligated product with said first restriction enzyme, 
e)repeating step d) with each other DNA unit in turn so as 
to assemble the DNA unit in sequence; 



3. Claims: 44-45 

A method of transforming a host with one or more synthetic 
DNA assemblies encoding enzyme domains which comprises the 
step of: a) Inserting said DNA assembly into a vector 
containing a mutated internal fragment of a recA gene 
sequence such that the vector is capable of undergoing 
homologous recombination with the recA gene of the host, b) 
bringing said vector into contact with a host chromosome 
under conditions which permit homologous recombination to 
take place, c) disrupting the host recA gene by integration 
of the DNA of said vector into the chromosome; said method 
wherein the expression vector is used to transform a 
Streptomyces host; 
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Identify polyketide units - e.g. whether acetate, propionate, etc, 

Break-up and identify - break up the carbon skeleton and identify how many 
such carbon units are present. Eight units would mean one requires eight 
5 modules to make a PKS. 

Choose - choose the modules or domains that would be required, form an 
existing library of such PKS modules and domains. 

io Assemble - assemble the DNA units (modules/domains/using the invention. 



Express - express the assembled gene in a host and check for compound 
production. 



15 Figure 20 shows a schematic representation of they hypothetical polyketide 
synthase for synthesising octalactin B, assembled from enzyme units that 
belong to various PKSs in the public domain. 



Figure 21 shows a schematic representation of the hypothetical 
20 decarestrictine polyketide synthase for synthesising the anti-cholesterol 
compound decarestrictine J, assembled from enzyme units that belong to 
various PKSs in the public domain. 
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Examples 

Fvam piP 1: Vectorial assembly of DNA units 

DNA units that are to be assembled contain the Xbal recognition 
5 sequence at either end of the unit. At one of the ends, two nucleotides (GA) 
are arranged at the 5' end of the Xbal recognition sequence (thus making it 
5'GATCTAGA3')- This is achieved by first incorporating the Xbal 
recognition sequences in the oligonucleotide primers and then amplifying 
the desired DNA unit by PCR. The PCR products are then ligated to a 
I0 pUC-1 8 vector, used to transform a dam + strain of E. coli, and the clones 
isolated and sequenced for possible errors in the PCR products. A dam + 
strain of E. coli- like DH10B™ - methylate the nucleotide A in the 
sequence GATCTAGA, as 5'GATC3' is a sequence that is recognised by 
the product of the Dam methylase gene (Fujimoto et a/.,1965; Geier etai, 
,5 1 979). This makes only one end of the DNA unit cleavable by Xbal The 
vector is then used to transform a dam" strain of E. coli (e.g. ET12567 - 
MacNeil ef al. (1992)) and the plasmid DNA isolated. This DNA is now 
cleavable at both ends of the DNA unit by Xbal. When a library of units has 
been constructed using this strategy, and both ends of these units have 
20 been cleaved by Xbal, they are progressively inserted into a vector that has 
a unique Xbal site and the ligated products are used always to transform a 
dam + strain of E. coli, thereby making sure that one end of the DNA unit is 
always protected from cleavage by Xbal through methylation. When the 
assembly of such units is completed, the final plasmid is integrated into a 
25 streptomyces strain for the production of the desired polyketide. 

Using this methodology, the polyketide synthase DEBS1-TE, a 
multienzyme that has the first of the three bimodular erythromycin DEBS 
enzymes (DEBS1), fused with the erythromycin thioesterase (Cortes ef al., 
1 995) was constructed in a de novo fashion. The ten inherent PKS 
30 domains in DEBS1 -TE, namely, loading module (itself composed of an AT 
and an ACP), KS1 (ketosynthase of module 1), AT1, KR1, ACP1, KS2 
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(ketosynthase of module 2), AT2, KR2, ACP2 and TE function in 
conjunction to catalyse the synthesis of (2R,3S,4S,5R)-2,4-dimethyl-3,5- 
dihydroxy-n-hexanoic acid 5-iactone (2), figure 3. 

The DNA for all ten domains was amplified by PCR to incorporate 

5 the two aforementioned recognition sequences for Xbal (5TCTAGA3' and 
5'GATCTAGA3') at the 5* and 3' ends of the DNA unit respectively. The 
PCR products were cloned in pUC18 vector, sequenced, and then used to 
transform the dam' £ co//ET12567 strain. To initiate the assembly 
process, the DNA unit for TE was inserted into S. erythraea expression 

10 vector pCJR24 (Rowe et aL, 1998) which has a unique Xbal site. This 
vector also contains a thiostrepton-resistance gene as a marker for 
identifying successful integrands. The ligated products were used to 
transform the dam + E coli DH10B™ strain and the plasmid DNA isolated. 
This plasmid (pAR1) can only be singly cleaved with Xbal, despite 

15 possessing two Xbal recognition sequences, as one of the sites (situated at 
the 3' end of the TE unit) has been methylated by the E. coli Dam 
methylase. The next DNA unit (ACP2 from module 2 of DEBS1) was then 
ligated to the Xbal-cut pAR1 , the ligation mixture used to transform DH10B 
cells and the plasmid DNA isolated. Likewise, the other eight DNA units 

20 were successively added to pAR1 to finally yield the expression plasmid 
pAR10 containing the reconstituted DEBS1-TE gene (Figure 3). The 
junctions where these domains were joined were chosen in the linker 
regions that lie between these domains, so as to cause minimum 
disturbance of the structural features of these domains, that might in turn 

25 affect the proficiency of the domains themselves (Figure 3). Plasmid pAR1 0 
was then used to transform S. erythraea/JC2 - a mutant strain of the wild- 
type S. erythraea NRRL2338 that lacks the DEBS genes except for the TE 
DNA fragment (Rowe et al. t 1998). Thiostrepton-resistant colonies were 
selected upon integration of the vector into the S, erythraea chromosome. 

30 Single transformants were grown on selective media, as described in the 
methods section. The fermentation broth was extracted with ethyl acetate 
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and a sample of the organic extract was analysed by gas chromatography- 
mass spectroscopy (GC-MS). Two peaks were observed, corresponding to 
molecular massess 158 and 172, indicating the presence of the expected 
acetate- and propionate- derived polyketides (2R,3S,4S,5R)-2 J 4-dimethyl- 

5 3,5-dihydroxy-n-pentanoic acid d-lactone (1) and (2R,3S,4S,5R)-2,4- 
dimethyl-3,5-dihydroxy-n-hexanoic acid d-lactone (2). Both compounds 
were isolated and fully characterised by high-pressure liquid 
chromatography (HPLC), 1 H 1D and 2D NMR, 13 C NMR, FT-ICR 
spectrometry, and by comparison with a synthetic standard of (2) (Brown et 

10 a/., 1995). One litre of fermentation broth produces 24 mg of (1) and 56 mg 
of (2) - yields that are comparable to those reported elsewhere (Lau et a/., 
1999). It can therefore be asserted that the ten newly constructed inter- 
domain junctions have not in any way dimmed the catalytic proficiency of 
the DEBS1-TE synthase. 

15 In the absence of any crystal-structure data on PKS domains, all 

genetic engineering efforts known in the art have been based on trial-and- 
error methods of experimenting with where to join two such domains. As a 
result, the yield of the synthesised polyketide products have varied 
depending upon the position in the polypeptide chain at which the domains 

20 or modules have been linked (McDaniel et a/,, 1999; Ruan e/a/., 1997). 
The successful functioning of the reconstructed polyketide synthase 
described above has supplied new information about the inter-domain 
junction sites. Using this information, and the described methodology for 
the rapid assembly of these enzyme units, it is now possible to carry out a 

25 'retrobiosynthetic analysis' of target molecules and then to use polyketide 
and other biosynthetic enzyme domains as truly 'off-the-shelf reagents to 
achieve a stereospecific synthesis. There is also the possibility of using this 
methodology for randomly combining DNA units that encode catalytic e.g. 
DH or transport e.g. ACP protein domains to generate combinatorial 

30 libraries of hybrid synthases. By using a suitable assay system to test for 
biological activity of the compounds that are generated by such means, it is 
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possible to go back and isolate the hybrid synthetic gene resposible for the 
production of these compounds. 

From 6-methylsaliciiic acid to maitotoxin, nature displays a 
staggering diversity in compounds that are synthesised by means of 
5 'combinatorial gene-shuffling'. This methodology, or variations of this 
methodology can be used as effective tools towards harnessing the - 
combinatorial potential of discrete enzymatic units or their sets that are the 
feature of multi-functional PKS and other systems. 

A similar system to the XbaMdam system described above, uses the 
10 restriction enzyme Fo/d which has the recognition site: 

5'GGATG(N) 9 j3' 
3'CCTAC(N)i 3 T5' 

with the dcm methylase of E.coli. Adding CCA or CCT to the 5* end of the 
Fo/d recognition site would make the site dcm sensitive. Furthermore, if 
15 the sequence TCTAGA were inserted into the redundant section of the Fold 
restriction site, then the enzyme could be used to generate 'X£>al-cut ends'. 
Methods 

E. coli dam + DH10B™ strain was purchased from Gibco BRL, USA.. 
Pfu DNA polymerase was purchased from Boeringer, Germany. 

20 Construction of the final expression plasmid pAR1 0 was carried out 

in several steps, as follows. The ten PKS DNA units were amplified by PGR 
using pfu DNA polymerase. The respective regions of eryAI gene, as well 
as the oligonucleotides used for each PCR are outlined: 
LM - segment of eryAI gene (Bevitt et a!., 1992) extending from nucleotide 

25 (N) 588 to N 2389; 

5 , GGCATATGGCGGACCTGTCAAAGCTCTCCGACAGT3 , and 
5'GGTCTAGATCCCAGCCGCGGTCGGTCGGCAGTCCCG3\ 
KS1 - segment of eryAI gene extending from N 2384 to N 3769; 
S'GGTCTAGACTCGCTGTTCCACCCCGACCCCACGCGCTCGGGCACC 

30 GCGCACCA3' and 



V 
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5'GGTCTAGATCGCGCAGCGCGGCGGACTCGTCGACGGGGGCGAAG 
GCGG3', 

ATI - segment of eryAI gene extending from N 3764 to N 4813; 
5'GGTCTAGACGGTCTCGCGACGGGAAACGCCGACGGTGCCGCCGTT 
5 GGAA3' 
and 

5'GGTCTAGATCCACCGCGACACCGGCGGCGAACGCGCGGGAGAGC 
GCTTCGC3', 

KR1 - segment of eryAI gene extending from N 4808 to N 6316; 
10 5'GGTCTAGAGTCGGTGCACCTGGGCACCGGAGCACGCCGGGTGCCC 
TT3' 
and 

5'GGTCTAGATCGTCGAAGAGCCTGGTCGGGCGCTGCGCGGTGTA3', 
ACP1 - segment of eryAI gene extending from N 631 1 to N 6679; 
15 5'GGTCTAGACGACGCGCGGCGGGCTGCGCCGCAGGCGCCGGCCGA 
ACCGCGGG3' 
and 

5'GGTCTAGATCGGCCGTGG-TCGCCGGTGCCGCCTGCTCGGCT3', 
KS2 - segment of eryAI gene extending from N 6674 to N 8200; 
20 5'GGTCTAGACGAGCCGATCGCGATCGTCGGCATGGCGTGC- 
CGGCTGC3' 
and 

5'GGTCTAGATCGTGCACGGCCTCGGCGGTGTCGGCGGCGAGC- 
ACCGCGGCCCGCTCCTC3', 
25 AT2 - segment of eryAI gene extending from N 8195 to N 9340; 
5'GGTCTAGAGGCGGTGGCCGACGGCGCGGTGGTT3' 
and 

5'GGTCTAGATCGTCACGAGGGGTGGTGCGGTCCGGCAGCAGCCAGA 
A3', 

30 KR2 - segment of eryAI gene extending from N 9335 to N 1 0639; 
5'GGTCTAGACGGCTGGTTCTACC-GGGTCGACTGGACCGAG3' 
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and 

5 , GGTCTAGATCCGGCCGGGGCC6GGCGGCGG-TGTAG6ACT3 , f 
ACP2 - segment of eryAI gene extending from N 10634 to N 10966; 
SXaGTCTAGACCGCATCGTCACGACCGCGCCGAGCGAS' 
5 and 

5GGTCTAGATCG-GCGTCGAGGAAA3', 

TE - segment of eryAlllgene (Donadio etal. 1991) extending from N 8753 
to N 9602; 5'GGTCTAGACAGCGGGACTCCCGCCCGGGAAGCG3' 
and 

10 5'GGGCTAGCTCTAGATCATGAATTCCCTCCGCCCAGCCAGGCGTC3'. 

All PCR products were 5' phosphorylated and ligated to Smal-cut, 
dephosphorylated pUC 18 vector and used to transform £, co//DH10B 
electrocompetent cells. The desired plasmids - containing the amplified 
DNA fragments were isolated and sequenced using standard pUC forward 

15 and reverse primers. No mistakes in the amplified products we:e detected. 
All ten plasmids were then used to transform the E.coli ET12567 dam' 
strain. Isolated DNA was digested with Xba\ restriction enzyme and desired 
fragments isolated and purified. The TE unit was then ligated to Xtoal-cut 
pCJR24 vector and the ligation products used to transform E. co// DH10B 

20 electrocompetent cells. Plasmid pAR1 was isolated, digested with Xba\, 
and ligated to the ACP2 fragment, and ligation products treated as 
mentioned above. The other DNA fragments, namely, KR2, AT2, KS2, 
ACP1 , KR1 , AT1 and KS1 were sequentially added to finally yield plasmid 
pAR10. This plasmid was then digested with Nde\ and Xba\ restriction 

25 enzymes and ligated with the LM fragment previously digested with the 
same two enzymes. The ligated products were used to transform E. colt 
DH10B electrocompetent cells and the final expression plasmid pAR10 
isolated. Plasmid pAR10 was then used to transform S. erythraea/JC2 
strain and colonies carrying the expression plasmid were selected through 

30 resistance to thiostrepton upon integration of the plasmid into the S. 

erythraea chromosome. Single transformants were picked and grown on 



WO 00/77181 




PCT/GBOO/02286 



-28- 



10 



tap-water medium plates supplemented with thiostrepton, following which 
single transformants were grown in 5X200ml of SMS liquid media 
supplemented with 5 ug/ml of thiostrepton for seven days (Rowe ef a/., 
1998). Cells were removed by centrifugation, the supernatant was 
saturated with NaCI and extracted three times with equal volumes of ethyl 
acetate at pH 4.0. The solvent was evaporated to yield 1.12 g of crude 
product. A sample of this crude product was analysed by GC-MS. Two 
peaks were observed, corresponding to molecular masses 158 and 172, 
indicating the presence of the expected acetate- and propionate- derived 
polyketides ( 2 R,3S,4S,5R)-2,4-dimethyl-3,5-dihydroxy-n-pentanoic acid 8- 
lactone (1) and (2R,3S,4S,5R)-2,4-dimethyl-3,5-dihydroxy-n-hexanoic acid 
6-lactone (2). Compounds (1) and (2) were found to be structurally identical 
to those reported previously (Cortes et at., 1995). 
Characterisation of ( 2R,3S,4S.5R)-2A-dimethyh3,5-dihydroxy-n-pentanoic 

15 acid 8-lactone (1) 

1 H NMR (CDCI 3 , 500 MHz) 5 H 4.45-4.35 (1H, dq, J = 6.56 and 1 .62 Hz, C 5 - 
H), 3.8 (1H, dd, J = 10.15 and 4.17 Hz C 3 -H), 2.45-2.70 (1H, br, O-H), 2.42 
(1H, dq, J = 10.0 and 6.97 Hz C 2 -H), 2.05 (1H, m, C 4 -H), 1.37 (3H, d, J = 
7.17 Hz, C 2 -CH 3 ), 1-32 (3H, d, J = 6.74 Hz, C5-CH3), 0.95 (3H, d, J = 7.20 
Hz, C4-CH3) ppm. 13 C NMR (CDCI3, 250 MHz) 5 174.20, 76.15, 73.62, 
39.42,38.14, 18.11, 14.24,4.48. 

C/7aracfer/saf/ono^2a3S,4S,5R;-2,4-d/mefhy/-3,5-d//7ydroxy-n-hexano/c 
acid 6-lactone (2) 

'H NMR (CDCI3, 500 MHz) dH 4.13 (1H, ddd. J = 8.12, 5.93 and 2.19 Hz, 
C 5 -H), 3.82 (1H, m, C 3 -H), 2.42-2.50 (1H, dq, J = 10.17 and 7.08 Hz, C 2 -H), 
2.12-2.19 (1H, m, C 4 -H), 1.77-1.86 (1H, m. one of C 6 -H 2 ), 1.52-1.61 (1H, m, 
one of C 6 -H 2 ), 1 .4 (3H, d, J = 7.09 Hz, C 2 -CH 3 ), 1 .0 (3H, t, J = 7.42 Hz, C 6 - 
CH 3 ), 0.97 (3H, d, J = 6.96 Hz, C4-CH3) ppm. 13 C NMR (CDCI3, 250 MHz) d 
173.56, 81 .34, 73.96, 40.08, 36.76, 25.27, 14.27, 9.88, 4.37. 



20 
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Example 2: in vitro asso mhly of DNA units 



WO 00/77181 

PCT/GBOO/02286 

-29- 



Figure 7 outlines the strategy for the in vitro assembly of PKS DNA 
units. The inventors have constructed the multienzyme DEBS1-TE. The in 
vivo construction of the gene for DEBS1-TE, it should be noted, took 12 
days to complete. The in vitro assembly on the other hand was completed 
5 in 2 days. 

All ten domains, namely, LM, KS1, KR1, AT1, ACP1, KS2, AT2 f 
KR2, ACP2 and TE were amplified by means of PCR. The forward primer 
in all cases, except the LM contained the Spel recognition sequence 
5'ACTAGT3' while the reverse primer was engineered in such a way that it 

10 contained the Xbal recognition sequence 5' TCTAGA3' and Smal 

recognition sequence S'CCCGGGS' downstream of the Xbal site (Figure 7). 
The amplification of the LM was carried out using a biotinylated forward 
primer and a reverse primer that contained the Xba\ recognition sequence 
(5TCTAGA3'). All the PCR products were cloned in pUC-1 8 vector and the 

15 resulting plasmids sequenced to detect possible errors introduced by PCR. 
All plasmids, except the one containing the LM unit were then digested with 
Spel and Smal, dephosphorylated in order to remove the 5' phosphate 
group and the appropriate fragments isolated and eluted. The LM unit was 
cleaved with Xbal and attached to a bead that was coated with streptavidin 

20 (following the manufacturer's instructions) as shown in figure 7. 

The assembly process was initiated by adding DNA ligase to the 
tube containing a large excess of the first unit (KS1) and LM-bead. The 
reason for having a large excess of the KS1 unit compared to the LM-bead 
unit is to favour the LM-bead ligating to the incoming unit, as opposed to 

25 the self-ligation of the LM-bead (see figure 7). The ligation of the two DNA 
fragments is unidirectional as only the Spel-cut end of KS1 complements 
the Xbal-cut end of the LM-bead. After the ligation was complete, the 
desired product of the ligation reaction, namely l bead-LM-KSV was 
separated from the reaction mixture and washed. This product was then 

30 cleaved with Xbal, in order to activate the 3' end of KS1 . The beads were 
washed again to remove the small Xbal-Smal DNA fragment that was 
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released from the 3' end of KS1 as a result of RE cleavage. The 'activated' 
bead-LM-KS1 unit was then ligated with Spel, Smal-cut and 5' 
dephosphorylated AT1. The Spel-cut 5' end of AT1 complemented the 
Xfoal-cut 3' end of KS1 to give bead-LM-KS1-AT1 as shown in figure 8. 

5 This product was separated from the reaction mixture and washed as 
before. The 3' end of AT1 in this product was then 'activated* through 
cleavage by Xba\, and the assembly process continued. 

Finally, Spel, Smal-cut and 5' dephosphorylated TE unit was ligated 
with the DNA fragment that was now bead-LM-KS1-AT1-KR1-ACP1-KS2- 

10 AT2-KR2-ACP2 as shown in figure 9. The 3' end of the latter fragment was 
'activated 1 by digesting it with Xbal. The assembled DEBS1-TE gene was 
then inserted in the expression plasmid pCJR24 and the resulting plasmid 
used to transform a streptomyces strain. The expected triketide lactone 
products were isolated and structurally characterised. 

15 Use of the in vitro technology described above drastically reduces 

the time it takes to assemble predetermined or randomly shuffled genes. 
Also, the possibility of continuing with the assembly process while having 
numerous different assembly arrays attached to the beads, and splitting 
and mixing the beads between each unit/module addition from a library of 

20 units/modules, results finally in the generation of a cascade of different 
assemblies (Figure 10). These assembled genes can then be cloned 
simultaneously and expressed in a suitable host. An assay system can 
then be used to identify those assembled genes that yield bio-active 
compounds. 

25 

Example 3: Retrobiosynthetic synthesis of a target molecule 

A strategy employing the invention in order to construct the highly 
potent anti-breast cancer drug discodermolide, the anticholesterol 
compound decarestrictine, and the antitumour compound octalacin using 
30 polyketide synthase domains/modules is outlined below. 
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Discoderroolide 

The drug discodermolide (Figure 18), isolated from the marine sponge 
'Discodermia disoluta\ has been identified as a highly potent anti-cancer 
compound and 80 times more effective than the well known anticancer 

5 drug Taxol (TerHarr et al., 1996). It has the same mechanism of action as 
Taxol, even though it is structurally different from the latter. 

One can infer from its structure (Figure 18) that discodermolide is a 
polyketide and can therefore be constructed from a system that has the 
basic enzymatic building blocks (domains and modules) that make other 

10 polyketides like erythromycin and rapamycin. Having predicted that 
approximately 45 domains housed in 12 modules would be required in 
order to carry out the chemistry that accounts for the functionalities on the 
carbon skeleton of discodermolide, one can now begin to construct such a 
system. All one has to do is to identify the type and nature of the 

15 domains/modules that one requires to generate the observed 
functionalities, and then assemble these units in the desired order (Figure 
18). The resulting DNA assembly can then be put into a bacterial strain that 
makes a functional polyketide synthase. 

Until now, it would have been exceedingly difficult, if not impossible 

20 to assemble 45 or so pieces of DNA in the wanted order, for several 
reasons. Firstly, one would have to look for two different restriction 
enzymes every time one needed to assemble two DNA segments. This is 
because if one uses just one restriction enzyme at either end of the 



WO 00/77181 




-32- 

domain, the already-assembled piece/pieces of DNA would be cleaved 
from the assembly every time one decided to insert a new domain. 
Secondly, in GC-rich DNA like the polyketide synthase producing 
Streptomyces strain, unique restriction enzyme sites are few and far 

5 between. To a molecular biologist, the task of assembling 40 pieces of 
DNA with the limitations mentioned above, would seem an insurmountable 
one. One would rather attempt to isolate the genes that make the drug at 
the first place than consider carrying out "step-by-step" reconstruction of 
the gene itself. In the case of discodermolide, even the last possibility is in 

10 the realms of fantasy. The organism within the marine sponge that makes 
the drug has not been identified. The only way discodermolide can be 
made available is through chemical synthesis - there have been a few 
chemical routes reported in literature recently (Marshall and Johns, 1998 
and references therein). However, as is the case with most other complex 

15 molecules, large scale production of discodermolide, using the chemical 
route would turn out to be outrageously expensive. Chemists have been 
using the retrosynthetic analysis approach towards total synthesis of 
important bioactive molecules. This approach breaks the target compound 
into many smaller pieces - easily synthesised - which are then re- 

20 assembled. 

The type of polyketide or other synthetic enzyme domains required 
in order to construct the target molecule from the starting units are 
identified using a "retrobiosynthetic analysis" approach for discodermolide, 
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by matching which molecules need to be condensed to form the 
macromolecule with the enzyme domains that carry out the required 
catalysis to build the macromolecule. 

Having identified the enzyme units that are required, the unit-DNA 

5 segments are amplified using the polymerase-chain-reaction (PCR) - from 
the library of existing polyketide synthase unit-DNA, and the appropriate 
recognition sequences are attached to each unit-DNA fragment. All of the 
unit fragments are then replicated in a dam" strain whereby both the 
unmodified and modified sequences (5TCTAGA3' and 5'GATCTAGA3' 

10 respectively) are cleaved by the restriction enzyme Xba\. 

Having constructed this library of appropriate PKS or other synthetic 
enzyme units, the corresponding DNA units are then assembled. The 
assembled DNA piece is then placed in a vector, so that it can be inserted 
in a bacterial strain to yield the desired synthetic protein. Suitable vectors 

is have an antibiotic resistance marker (for selection of this vector on an 
antibiotic-rich media) and an "origin-of -replication" (ori). Ori is essential for 
the independent growth of the vector in any strain. Particularly suitable 
vectors for the expression of the synthetic enzymes of the invention are the 
actinomycete vectors described by Rowe et al. (1998). 

20 The strain is then grown in a media that is supplemented with the 

antibiotic, the resistance gene for which is present in the vector. 

Figures 4 and 5 show how the assembly proceeds. The first domain 
is inserted into a vector that is cut by cleavage with Xba\. After the ligation 
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of the domain has taken place with the vector, the DNA is put in a bacterial 
strain that is dam + and grown. Finally, bacterial colonies that have the 
desired vector-domain DNA are identified and DNA isolated from them. The 
whole procedure is cheap and fast. Only one restriction enzyme (Xba\) is 

5 made use of, routine cloning technology is employed, the desired DNA 
fragment is obtained, which can then be expressed in a Streptomyces 
strain to yield the polyketide synthase. 

The in vivo "domain-by domain" construction of the discodermolide 
producing polyketide synthase would take approximately 55 days via this 

10 method. In comparison, assembly of modules would take less time, as one 
would need to assemble fewer pieces. Most importantly, once the synthase 
is shown to be functionally active, a large fermentation of the bacterial 
strain can be carried out, and the drug isolated in however much quantity 
one requires - unlike the chemical route where the starting materials have 

15 to be freshly synthesised every time one requires the target compound. 
Employing such a strategy would lead to a quick and inexpensive synthesis 
of important bioactive molecules like discodermolide. 
Retrobiosynthetic analysis 

The whole approach (retrobiosynthetic analysis followed by 

20 identification of PKS units, followed by assembly of PKS units) is made 
clearer in the following two examples. 
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Octalactin 

A new addition to the rare class of eight-membered lactone natural 
products is the family of Octalactin. Octalactin A and B (Figure 20) are 
natural products isolated from the marine gorgonian octocoral ' Pacifigorgia 

5 sp: (Tapiolas et. al M 1991). Octalactin A shows very strong cytotoxicity 
toward B-16-F-10 murine melanoma and HCT-116 human colon tumour 
cell lines and is a promising drug candidate, while octalactine B displayed 
no such activity (Tapiolas et. al., 1991). Total syntheses of both octalactin A 
and B have been reported in literature. One such synthesis (Buszek, et. al., 

10 1994) typically involves more than 12 chemical steps in leading to the 
target molecules. Clearly, large-scale production of octalactins using 
chemical synthesis is industrially not viable. On the other hand, the genes 
that code for the enzymes that make octalactins have not be identified or 
isolated. This means that at present, modified octalactins can only be made 

is using chemical synthesis. A gene is constructed from the available PKS 
spare parts - that would code for the enzymes that would make octalactin 
B. Octalactin B can then be converted into the cytotoxic octalactin A by 
one-step stereospecific epoxidation. Also, once the gene for octalactin B is 
constructed and shown to make the octalactin PKS, genetic engineering on 

20 this gene would yield modified octalactin PKSs that in turn would 
synthesise octalactin analogues. 

Clearly, a polyketide, the carbon skeleton of octalactin B (Figure 19) 
can be seen to be assembled by acetate and propionate units. The uptake 
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and assembly of these units in the prescribed sequence, as well as the 
functionalities that decorate the carbon chain of octalactin can be assigned 
to various PKS modules (see figure 19). Once a decision has been made 
regarding the type and nature of PKS modules, they can be strung together 
5 to make a gene using the invention. This gene can then be expressed in a 
suitable host in order to look for octalactin B production. The 
retrobiosynthetic approach towards octalactin is shown in detail in figure 
19. A choice of what modules to select from the PKS module library is 
followed by amplification of the modular DNA fragments using the 
10 oligonucleotides such that the 5' and the 3' ends of every DNA fragment 
have the restriction enzyme recognition sites stated under the description 
of the invention. The choice of modules that, when assembled, would make 
the 'octalactin gene' is displayed as a schematic representation in figure 
20. 

15 Decarestrictine J 

The molecule decarestrictine J can be synthesised using the 
retrebiosynthetic approach. Decarestrictine J is a ten-membered lactone 
that comes from the family of decarestrictines, shown to display strong anti- 
cholesterol activity (Grabley et. a!., 1992). The total synthesis of 

20 Decarestrictine J has been reported and involves numerous chemical steps 
(Yamada et. al., 1995). The target molecule (figure 21) can be conceived to 
be formed by assembly of five acetate polyketide units. Using the 
retrobiosynthetic approach, one can identify the PKS domains/modules that 
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would be required for the carbon skeleton of decarestrictine J. A 
hypothetical decarestrcitine PKS is shown in figure 21 . The loading module, 
as well as the four internal modules along with the TE domains can be 
conveniently assembled using the invention. The assembled 
5 'decarestrictine gene' can then be expressed in a suitable host in order to 
check for the production of decarestrictine J. 

In summary, the retrobiosynthetic approach involves the following 

steps; 

a) . Identification of the number and nature of carbon units that make up the 
10 target molecule 

b) . Identification of the modules/domains from libraries of 
polyketide/peptide synthetase/fatty acid/etc. encoding units that are 
responsible for the uptake of the said carbon units and the nature and 
degree of functionalisation of the carbon chain 

15 c). Assembly of the said modules/domains using the methods of the 
invention 

d). Expression of the assembled gene in a suitable expression host. 

Example 4: Transforming strains with DNA encoding similar synthetic 
20 enzyme domains 

A method for transforming expression strains with DNA encoding similar 
synthetic enzyme domains has been devised. Instead of using the TE PKS 
DNA fragment as a region of integration from the assembled gene into a 
25 streptomyces host (S. e/yt/?raea/JC2, Rowe et a/., 1998), a mutated recA 
gene fragment from streptomyces is used. The assembly process is carried 
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out in a recA E. coll strain (e.g. DH10B) as previously described. As this 
strain is recA", one can assemble any number of identical DNA units. The 
vector, into which the assembled gene is being constructed, contains a 
portion of a streptomyces recA gene. This recA fragment carries a 

5 mutation. After the synthetic enzyme gene has been assembled, the vector 
is used to transform a streptomyces host (e.g. S. lividans or S. erythraea). 
The fragment of recA gene carrying a mutation recombines with the recA 
gene of the streptomyces host, abolishing the functional recA gene and 
making the strain recombination minus (Figure 11). This means that an 

10 event, such as the one described in figure 2 is now not possible. The strain 
is then grown to look for the encoded enzyme product. This strategy is 
tested by assembling a functional PKS gene having more than one type of 
identical DNA units (Figure 12). 

Construction of the PKS multienzyme recDEBS1-TE 

15 RecA protein has been characterised as a multifunctional enzyme that is 
essential for homologous recombination, DNA repair, SOS response and 
DNA rearrangements (Miller and Kokjohn, 1990). Most of the routinely used 
strains of E. coll are recA'. The gene for recA has been identified from 
many streptomyces strains. The first streptomyces recA gene to be 

20 characterised and isolated was from S. lividans (NuBbaumer and 
Wohlleben, 1994) RecA mutants have since been generated in S. 
ambofaciens (Aigle et at., 1997). The streptomyces recA protein has 
approximately 372 amino acid residues (Figure 13). DNA sequence 
analysis suggests a coding region of 1 122 bp, and is found to be highly 

25 conserved within streptomyces (Figure 14). In fact the recA mutants of S. 
ambofaciens were generated by integrating a mutated portion of the S. 
lividans recA gene into the S. ambofaciens host. It was found that a recA 
mutant lacking 30 aa from the C-terminus of the protein inhibited 
recombination events in S. ambofaciens (Aigle et a/., 1997). 

30 A recA mutant of the streptomyces host that is used for expression 

of the assembled gene was generated. 
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The oligonucleotides: 

5'- GGTCrAG^ATTCGGCAAGGGCGCCGGTCATGCGCAT-3' and 
5'- GG TCT4GA TCTGCGGCGTCGGCCGGGGCGGCGGAGGCG-3' 
were used as the forward and reverse primers respectively and the 1000 
5 bp internal region of S. lividans recA gene (NuGbaumer and Wohlleben, 
1994) was amplified using pfu polymerase. An additional nucleotide (C) 
was incorporated into the toward primer to generate a frame shift in the 
amplified recA gene fragment. The PCR product was cloned- in pUC-18 
vector and sequenced to detect for possible errors during PCR. The 1.0 

10 kbp recA fragment, flanked at both ends by an Xba) site was then inserted 
in the expression vector pCJR24 that has a unique Xba\ site. The ligation 
mixture was used to transform E. co//DH10B cells and the desired plasmid 
DNA isolated. The resulting plasmid (pAReo424) contains a non- 
methylated Xba\ site at the 5' end of the recA gene fragment. The ten PKS 

15 DNA units, namely, TE, two each of ACP1 , KR1 , AT1 & KS1 , and LM were 
inserted into the plasmid pAflec>424 to finally yield the expression plasmid 
pRecADME. This plasmid was used to transform wild-type S. lividans 
protoplasts, and thiostrepton resistant colonies were grown in defined liquid 
media as described above. The compound (Figure 12) was isolated from 

20 the bacterial broth and chemically characterised. 

Thus, it has been shown that a gene carrying interspaced DNA units 
that are identical in structure as well as function does not lead to internal 
recombination events, as the native recA gene of the streptomyces host 
has been disrupted. Furthermore, it has been shown that it is possible to 

25 use identical domains to reach the objective of generating hybrid synthetic 
enzyme systems. This strategy will greatly reduce the number of domains 
that otherwise have to be employed for the purposes of de novo PKS gene 
assembly that yields the desired chemical compounds. The inventors have 
established a set of 12 domains that are capable of functioning robustly 

30 and are independent of flexibility and spacial constraints - problems that 
beset the choice of domains and modules previously. 
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CLA1MS 

5 1 . A method of assembling several DNA units in sequence in a 

DNA construct, which method comprises the steps of 

a) providing each DNA unit with a restriction enzyme -recognition 
sequence at it's 5' end and with a recognition sequence for the same 

io restriction enzyme at its 3* end that is combined with a recognition site for a 
DNA modification enzyme. 

b) providing a starting DNA construct having an accessible 
restriction site for the same or a compatible restriction enzyme and cleaving 

15 the starting DNA construct with such a restriction enzyme, 

c) inserting the desired DNA unit and bringing the ligated 
product into contact with a DNA modification enzyme such that the 
restriction site at the 3 1 end of the inserted DNA unit is abolished 

20 

d) cleaving the ligated product at an accessible unmodified 
recognition site for the same or a compatible restriction enzyme, 

e) repeating steps c) and d) to introduce each desired DNA unit 
25 to give a DNA construct containing all the desired units in sequence. 

2. The method of claim 1 wherein the DNA modification enzyme 

is a methylase. 

30 3. The method of claim 2 wherein the methylase is the dam 

methylase of Escherichia coli. 
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4. The method of claim 3 which comprises the steps of 

a) providing each DNA unit with an Xba\ recognition sequence 
5 5'XXTCTAGA3' (where XX is not GA) at it's 5' end and with an Xba\ 

recognition sequence 5'GATCTAGA3' at its 3* end. 

b) providing a starting DNA construct having an accessible Xba\ 
site and cleaving the starting DNA construct with Xbal, 

10 

c) inserting the desired DNA unit and using a resulting ligated 
product to transform a dam+ strain of E. coli, 

d) recovering a resulting plasmid and cleaving the plasmid at an 
15 accessible Xba\ site with Xba\ y 

e) repeating steps c) and d) to introduce each desired DNA unit 
to give a DNA construct containing all the desired units in sequence. 

20 5. The method of any one of claims 1 to 4, wherein the 

recognition sequences for the restriction enzyme and the DNA modification 
enzyme are created in the DNA units prior to cutting with the restriction 
enzyme. 

25 6. The method of claim 5 wherein the restriction sites are 

created in the fragment by means of a primer extension reaction. 

7. The method of any one of claims 1 to 6, wherein the DNA 

construct is an expression vector capable of facilitating expression of the 
30 protein encoded by the desired DNA units 
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8. The method of claim 3 or claim 4, wherein the DNA 

modification is removed and the restriction site re-established by replicating 
the ligated product in a dam- strain of E. coli by means of a suitable vector. 

5 9. A method of making an assembly of several DNA units in 

sequence which method comprises the steps of: 

a) providing a first DNA unit with a recognition sequence for a 
first restriction enzyme at its 3' end, and cleaving the said first DNA unit 

10 with said first restriction enzyme, 

b) providing each other DNA unit with a recognition sequence at 
its 5' end for a second restriction enzyme which has a compatible ligation 
sequence with that of the first restriction enzyme, and a downstream 

15 recognition sequence for said first restriction enzyme followed by a 

downstream recognition sequence for a third restriction enzyme at its 3' 
end, and cleaving each said other DNA unit with the second and third 
restriction enzymes, 

20 c) ligating the said first DNA unit with a desired other DNA unit 

to form a ligated product such that the ligation of the two units abolishes the 
recognition site for the first restriction enzyme at the ligation junction, and 
cleaving the ligated product with said first restriction enzyme, 

25 d) ligating the product from c) with a desired DNA unit from b) to 

form a ligated product and cleaving the ligated product with said first 
restriction enzyme 

e) repeating step d) with each other DNA unit in turn so as to 

30 assemble the DNA units in sequence. 
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1 0. The method of claim 9 which method comprises the steps of: 

a) providing a first DNA unit with an Xba\ recognition sequence 
5TCTAGA3' at its 3' end, and cleaving the said first DNA unit with Xbal, 

5 

b) providing each other DNA unit with a Spe\ recognition 
sequence 5'ACTAGT3' at its 5' end, and a downstream Xba\ recognition 
sequence 5TCTAGA3' followed by a downstream Sma\ recognition 
sequence 5'CCCGGG3' at its 3' end, cleaving each said other DNA unit 

30 with Spe\ and Sma\ 1 and dephosphorylating the 5' end of the cleaved DNA 
unit, 

c) ligating the said first DNA unit with a desired other DNA unit 
to form a ligated product and cleaving the ligated product with Xbal, 

15 

d) ligating the product from c) with a desired DNA unit from b) to 
form a ligated product and cleaving the ligated product with Xbal 

e) repeating step d) with each other DNA unit in turn so as to 
20 assemble the DNA units in sequence. 

1 1 . The method of claim 9 or claim 1 0 wherein the assembly 
occurs via stepwise addition of fragments to a vector 

25 12. The method of claim 9 or claim 10 wherein the said first DNA 

unit is attached to the solid phase for use in step c) 

13. The method of claim 12, wherein the solid phase is split and 

mixed between steps c), d), and e) to make several different assemblies. 



30 
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14. The method of any one of claims 9-13, wherein the 

recognition sequences in one or more of the DNA units are introduced by 
means of extension primers. 

5 1 5. The method of any one of claims 9-14 wherein the assembly 

of several DNA units is inserted in to an expression vector which is used to 
transform a host capable of expressing the protein encoded by the vector 

1 6. The method of any one of claims 1-15, wherein one or more 
10 of the DNA units encodes a catalytic or transport protein domain, (see 

Kleinkauf peptide/polyketide systems paper) 

17. The method of claim 16 wherein one or more of the DNA 
units are derived from polyketide synthesising enzyme domain DNA 

15 sequences. 

1 8. The method of claim 1 6 wherein one or more of the DNA 
units are derived from peptide synthesising enzyme domain DNA 
sequences. 

20 

19. The method of claim 16 wherein one or more of the DNA 
units are derived from hybrid peptide polyketide enzyme domain DNA 
sequences. 

25 20. The method of claim 16 wherein one or more of the DNA 

units are derived from fatty acid synthesising enzyme domain DNA 
sequences 



30 



21 . The method of claim 1 6 wherein one or more of the DNA 

units encode modules comprising one or more catalytic or transport 
domains 
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22. DNA constructs incorporating one or more DNA assemblies 
encoding synthetic enzymes made by any one of the methods of 
claims 1-21. 

5 

23. Synthetic enzymes encoded by one or more DNA assemblies 
made by the methods of anyone of claims 1-21 

24. Hosts expressing DNA constructs encoding one or more 
10 synthetic enzymes made by any one of the methods of claims 1-21 . 

25. Hybrids of transformed hosts expressing one or more DNA 
constructs encoding synthetic enzymes incorporating a DNA assembly 
made by any one of the methods of claims 1-21 . 

15 

26. Compounds produced by synthetic enzymes encoded by 
DNA assemblies made by any one of the methods of claims 1-21 . 

27. A method of synthesising a target molecule comprising the 
20 steps of 

a) examining the composition and stereochemistry of a target 

molecule, 

25 b) determining which catalytic and transport domains need to be 

present in a synthetic enzyme in order to catalyse the synthesis of the 
target molecule, 

c) using any one of the methods of claims 1-21 to assemble the 

30 required DNA units encoding the catalytic and transport domains into a 
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DNA assembly that encodes said synthetic enzyme which is capable of 
synthesising the target molecule. 

d) placing the DNA assembly into a vector to allow expression of 

5 the synthetic enzyme in a host capable of synthesising the target molecule 
after transformation with said vector. 

28. The method of claim 27 wherein the transformed host is 
tested for the presence of the target molecule after step d). 

29. The transformed host of claim 27. 

30. Use of transformed host of claim 27 to produce said target 
molecule. 

15 

31 . A method of making a synthetic enzyme to catalyse the 
synthesis of a target molecule comprising the steps of 

a) examining the composition and stereochemistry of a target 
20 molecule, 

b) determining which catalytic and transport domains need to be 
present in the synthetic enzyme in order to catalyse the synthesis of the 
target molecule, 

25 

c) using any one of the methods of claims 1-21 to assemble the 
required DNA units encoding the catalytic and transport domains into a 
DNA assembly that encodes an enzyme which is capable of synthesising 
the target molecule. 

30 
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d) expressing the DNA assembly in a suitable host to produce 

the enzyme. 

32. A library of DNA units encoding catalytic or transport protein 

5 domains, wherein each DNA unit has a recognition sequence for a 

restriction enzyme at it's 5-end and a second recognition sequence for the 
same or a compatible enzyme at it's 3-end which incorporates a 
recognition sequence for a DNA modifying enzyme. 

io 33. The library of claim 32, wherein each DNA unit has an Xba\ 

recognition sequence 5'XXTCTAGA3' (where XX is not GA) at it's 5'-end 
and an Xba\ recognition sequence 5'GATCTAGA3' at it's 3'-end 

34. A library of DNA units encoding catalytic or transport protein 
15 domains, wherein each DNA unit has a recognition sequence at its 5' end 

for a first restriction enzyme, and a downstream recognition sequence for a 
second restriction enzyme followed by a downstream recognition 
sequence for a third restriction enzyme at its 3' end, such that the DNA 
units, once restricted by the first and second restriction enzymes can be 
20 ligated together to abolish the restriction sites at the ligation junction. 

35. The library of claim 34, wherein each DNA unit has a Spel 
recognition sequence 5 l ACTAGT3' at its 5'-end, and a downstream XJbal 
recognition sequence 5TCTAGA3' followed by a downstream Sma\ 

25 recognition sequence S'CCCGGGS' at it's 3'-end 

34. The library of claim 32 or claim 34, wherein the DNA units 

encode polyketide synthetic domains, comprising two KS domains, at least 
two AT domains, two KR domains, two DH domains, two ER domains, an 
30 ACP domain and a TE domain. 
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35. A module comprising a DNA sequence encoding a functional 
set of polyketide synthetic domains wherein the module has a recognition 
sequence for a restriction enzyme at it's 5'-end and a second recognition 
sequence for the same or a compatible enzyme at it's 3'-end which 

5 incorporates a recognition sequence for a DNA modifying enzyme 

36. The module as claimed in claim 35, wherein the module has 
an Xba\ recognition sequence 5'XXTCTAGA3" (where XX is not GA) at it's 
5'-end and an Xbal recognition sequence 5'GATCTAGA3' at it's 3'-end 

10 

37. A module comprising a DNA sequence encoding a functional 
set of polyketide synthetic domains wherein the module has a recognition 
sequence at its 5' end for a first restriction enzyme, and a downstream 
recognition sequence for a second restriction enzyme followed a 

15 downstream recognition sequence for a third restriction enzyme at its 3 J 
end, such that the DNA units, once restricted by the first and second 
restriction enzymes can be ligated together to abolish the restriction sites at 
the ligation junction 

20 38. The module as claimed in claim 37, wherein the module has a 

Spe\ recognition sequence 5'ACTAGT3' at its 5-end, and a downstream 
Xbal recognition sequence 5TCTAGA3' followed by a downstream Smal 
recognition sequence 5'CCCGGG3' at it's 3'-end 

25 39. A module as claimed in claim 35 or claim 37, wherein the 

DNA units encode polyketide synthetic domains, comprising two KS 
domains, at least two AT domains, two KR domains, two DH domains, two 
ER domains, an ACP domain and a TE domain 

30 40. A vector containing one or more modules as claimed in claim 

35 or claim 37. 
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41 . The vector as claimed in claim 40, wherein a non-functional 

recA gene is also present. 

5 42. A method of transforming a host with one or more synthetic 

DNA assemblies encoding enzyme domains which comprises the steps of: 

a) Inserting said DNA assembly into a vector containing a 
mutated internal fragment of a recA gene sequence such that the vector is 

io capable of undergoing homologous recombination with the recA gene of 
the host, 

b) bringing said vector into contact with a host chromosome 
under conditions which permit homologous recombination to take place, 

15 

c) disrupting the host recA gene by the integration of the DNA of 
said vector into the chromosome. 

43. The method of claim 42 wherein the expression vector is 
20 used to transform a Steptomyces host. 

44. The method of claim 42 or claim 43, wherein the DNA 
assemblies are modules according to claim 35 or claim 37. 

25 45. A host lacking a recA function, transformed with a vector 

containing one or more modules according to claim 35 or 37. 



30 



46. A kit containing DNA units, DNA modules, vectors, DNA 

manipulation hosts, DNA modification hosts, expression hosts, or solid 
phase elements for use in the methods claimed herein. 



